Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning

Yu, Youjian; Song, Zhaowei; Zhu, Sifeng; Zhang, Qinghua

doi:10.3390/fi18040215

Open AccessArticle

Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning

¹

School of Computer and Cyber Sciences, Communication University of China, Beijing 100024, China

²

School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China

³

Library, Tianjin Chengjian University, Tianjin 300384, China

^*

Authors to whom correspondence should be addressed.

Future Internet 2026, 18(4), 215; https://doi.org/10.3390/fi18040215

Submission received: 6 March 2026 / Revised: 14 April 2026 / Accepted: 15 April 2026 / Published: 17 April 2026

(This article belongs to the Section Network Virtualization and Edge/Fog Computing)

Download

Browse Figures

Versions Notes

Abstract

To improve vehicle user service quality and address data privacy and security issues in intelligent transportation vehicle networking systems, a three-tier communication architecture with cloud-edge-end collaboration was designed in this paper. A Bayesian decision criterion was utilized to divide user data segments into fine-grained slices based on their privacy levels, and differential privacy techniques were applied to protect the offloaded data. To achieve multi-objective optimization between user service quality and data privacy and security, the problem was formulated as a constrained Markov decision process. A communication model, a caching model, a latency model, an energy consumption model, and a data-fragment privacy protection model were designed. Additionally, a deep reinforcement learning algorithm based on the actor–critic approach was proposed for the collaborative and centralized training of multiple intelligent agents (CTMA-AC), enabling multi-objective optimization decision-making for the protection of offloaded private user data. Simulation experiments demonstrate that the proposed multi-agent collaborative privacy data offloading protection strategy can effectively safeguard private user data while ensuring high service quality.

Keywords:

cloud-edge-end collaborative; privacy preservation; multi-objective optimization; deep reinforcement learning

Graphical Abstract

1. Introduction

With the advancements in the Internet of Things, artificial intelligence (AI), and other technologies, numerous new applications with high security requirements have emerged in the field of telematics [1,2]. These applications often contain significant amounts of private data belonging to users [3]. For example, autonomous driving technology holds critical information such as users’ travel habits and route preferences, whereas facial recognition systems contain users’ payment information. The leakage of such private information can lead to significant property losses and even threaten users’ personal safety [4].

In intelligent transportation systems, there is usually a coupled relationship between user privacy and data security protection on the one hand and user service experience on the other, which makes it challenging to balance user privacy and security with service quality [5]. In general, user terminals usually offload tasks to edge nodes for processing to reduce the burden of local processing and save system energy, but the fixed arrangement of edge nodes makes it possible for edge nodes to jointly leak the privacy of user data, which leads to an increased risk of user privacy data leakage [6]. In order to reduce the latency and energy consumption associated with transmitting and processing private data, tasks can be processed locally in the vehicle. In this regard, the authors of [7] proposed a cloud-assisted fog computing framework for task offloading and service caching under dynamic service caching conditions to reduce the associated latency and energy consumption. The authors of [8] proposed an immune-optimization algorithm for computational offloading schemes from the perspective of multi-objective optimization, analyzing in detail the latency, energy consumption, and user experience of tasks during computational offloading. In order to enhance the protection of users’ private data, the authors of [9] utilized machine learning to generate information from urban computing data and applied differential privacy to protect privacy. The authors of [10] proposed an efficient computational offloading architecture based on personalized privacy protection and designed a personalized privacy sensitivity level calculation method to achieve hierarchical protection of personal information. Therefore, balancing user service experience and user private data security in intelligent transportation systems remains a major challenge [11].

Offloading user data segments based on the level of privacy improves the quality of service of the system [12]. In intelligent transportation systems, service data are often massive, and these data contain private information about users, and even application services that are closely related to user privacy may contain regular data fragments, which can lead to a large amount of system resource consumption if the regular data fragments are regarded as private data [13,14]. The authors of [15] utilize a blockchain framework and Stackelberg gaming strategy to ensure secure task offloading for edge systems. However, these studies tend to treat the entire task dataset as private data without subdividing it into appropriate privacy levels before offloading, leading to inefficient resource utilization. Therefore, challenges remain in effectively segmenting user data based on privacy levels to improve user service quality.

In order to achieve lightweight protection of user data and improve user service quality, this paper proposes a three-layer cloud–edge–end collaborative architecture for intelligent transportation scenarios. On this basis, a unified privacy-aware task-offloading framework is developed at the data-fragment level. First, a Bayesian privacy-level classifier is introduced to assign each data fragment to one of the K privacy levels according to its privacy-related features, so that differentiated protection and offloading decisions can be applied. Then, a communication model, a caching model, a delay model, an energy-consumption model, and a K-level privacy-entropy model are constructed, and the corresponding multi-objective optimization problem is formulated. Finally, a multi-agent deep reinforcement learning algorithm, namely CTMA-AC, is employed to jointly optimize task offloading from the perspectives of system latency, energy consumption, and privacy-related performance. This paper makes the following contributions:

1.: A three-layer cloud–edge–end collaborative architecture is established for intelligent transportation scenarios, together with the corresponding communication model, edge-caching model, delay model, and energy-consumption model. These models provide the system foundation for privacy-aware and service-oriented task-offloading optimization.
2.: A fragment-level privacy-aware offloading framework is proposed by combining Bayesian privacy-level classification, differential-privacy-based perturbation for highly sensitive fragments, and a privacy-entropy metric for characterizing the dispersion of private fragments across collaborative nodes. On this basis, a lightweight multi-agent deep reinforcement learning strategy, namely CTMA-AC, is developed to optimize privacy-entropy-aware offloading decisions and reduce the risk of privacy leakage.
3.: The effectiveness of the proposed scheme is validated through simulation experiments by comparing it with SAC, DQN, full-local, and random offloading baselines. The experimental results show that the proposed method achieves a better tradeoff among latency, energy consumption, and privacy-related performance in the considered cloud–edge–end collaborative intelligent transportation scenario.

2. Related Work

2.1. Existing Work on Privacy-Aware Task Offloading

Current research on offloading private data in connected car scenarios has developed significantly [16]. Differential privacy protection has received much attention in order to mask the initial characteristics of the data. Reference [17] considered clustering protocols for data analysis and suggested a generalized definition of differential privacy to achieve the desired level of local privacy guarantees. Reference [18] proposed a differential privacy defense approach to deal with attacks by adjusting the privacy budget to protect privacy, mask membership, and reconstruct data by modifying and normalizing the confidence score vector using differential privacy mechanisms. Reference [19] describes the development of an optimal privacy budget allocation algorithm for traffic smart card data, which optimizes the privacy budget of each prefix tree node to minimize the query error by building a query probability model and quantifying the probability of querying a trajectory location pair.

Joint learning frameworks can balance user personalization and privacy protection while enhancing data utilization. The authors of [20] proposed an adaptive privacy-preserving joint learning framework that accommodates different communication rounds and clients by using a leakage risk-aware privacy decomposition mechanism based on quantized, dynamically allocated privacy budgets. The authors of [21] proposed a unified joint learning framework to solve the privacy preservation and personalization problems, achieving good privacy preservation and personalization performance. The authors of [22] devised methods to allow edge hosts to add noise during local training to preserve privacy, by making joint decisions with the central server to design optimal resource allocation policies.

Homomorphic encryption is an important privacy-preserving technique because it enables computation to be performed directly on encrypted data. Reference [23] investigated the application of homomorphic encryption in cloud computing and proposed a verifiable homomorphic encryption scheme to enhance security and privacy preservation. Reference [24] combined homomorphic encryption with a one-time keyboard mechanism to determine the winner of the process while concealing all bidding information, thereby achieving good performance. Reference [25] proposed a secure multi-party k-means clustering algorithm, in which ciphertexts under different keys are used to protect private data during the clustering process.

2.2. Discussion of the Differences from Existing Studies

The aforementioned studies provide important inspiration for this work in terms of privacy-aware task offloading and multi-agent deep reinforcement learning. To better distinguish this work from existing privacy-aware offloading studies, the main novelty of this paper lies in the joint integration of fragment-level privacy modeling, explicit privacy-entropy-aware optimization, centralized multi-agent collaborative learning, and a three-tier cloud–edge–end system architecture. Specifically, compared with existing studies, the main distinctions of this paper are reflected in the following aspects. First, in cloud–edge–end collaborative offloading scenarios, existing studies typically treat an entire task as a processing unit with a unified privacy level. In contrast, this paper introduces a Bayesian decision criterion to perform fine-grained privacy partitioning at the data-segment level, thereby enabling differentiated protection and offloading strategies for segments with different privacy sensitivities. Second, in addition to the two conventional optimization objectives, namely system latency and energy consumption, this paper explicitly further incorporates K-level privacy entropy into the optimization objective to measure the dispersion of the distribution of privacy-sensitive data segments among cloud, edge, and end collaborative nodes, thus enhancing the characterization of the system’s privacy protection performance. Furthermore, this paper proposes a centralized-training-based multi-agent actor–critic reinforcement learning framework, termed CTMA-AC, and employs a global replay buffer to train agents in a centralized manner, thereby improving inter-agent coordination efficiency and decision-making performance in dynamic Internet of Vehicles environments.

3. System Modeling

3.1. Three-Tier Communication Architecture for Cloud-Edge-End Collaboration

In this paper, the considered scenario is an intelligent transportation system in which connected vehicles travel in a smart city. The system consists of a central cloud server, multiple edge servers, and intelligent vehicular terminals. Each edge server is equipped with communication base stations and is connected to roadside units such as radar sensors, traffic lights, cameras, and parked vehicles. These roadside units collect real-time traffic information and upload it to service entities such as vehicular terminals, edge servers, and the cloud server, thereby supporting decision-making in intelligent transportation systems.

The communication methods among service entities in the considered scenario include the following:

1.: Vehicle-to-Vehicle communication (V2V): communication between intelligent connected vehicles.
2.: Vehicle-to-Infrastructure communication (V2I): communication between vehicles and roadside infrastructure such as radar units and edge servers.
3.: Wired communication: communication between edge servers.
4.: Wireless communication: communication between vehicles and the central cloud server.

To clarify the applicability of the proposed privacy protection model, the adversarial assumptions considered in this paper are summarized as follows:

1.: The central cloud server is treated as a trusted coordination entity responsible for global scheduling and resource management, and it is not considered an adversarial party in this work.
2.: Edge servers are assumed to correctly provide computation and caching services, but they may attempt to infer user privacy from the data fragments they receive and store. Therefore, edge servers are modeled as honest-but-curious entities.
3.: Multiple edge servers may jointly analyze the private fragments stored at different nodes, together with user offloading preferences and routing behavior, in order to reconstruct sensitive user information. This collusion threat is one of the main motivations for introducing the privacy entropy metric.
4.: In the considered threat model, the attacker may observe stored data fragments, fragment distribution across edge servers, and offloading-related traffic patterns. However, model-parameter leakage and direct compromise of the cloud server are not considered as primary attack surfaces in this paper.
5.: Vehicular terminals are regarded as normal service participants rather than malicious adversaries. Their role in this work is to generate tasks and offload data fragments according to the privacy-aware decision policy.
6.: The privacy entropy defined in this paper is used to characterize the dispersion of privacy-sensitive fragments across collaborative nodes. A larger privacy entropy indicates that sensitive fragments are distributed more evenly, which reduces the risk that a single edge server or a small colluding set of edge servers can reconstruct complete private information. However, privacy entropy is used here as a heuristic indicator of resistance to aggregation-based inference attacks, rather than a strict closed-form success probability of a specific reconstruction attack.

The intelligent transportation system architecture model established in this paper is shown in Figure 1.

Figure 1 shows the intelligent transportation system model established in this paper. The scene model includes a central cloud server, N edge servers, M vehicle terminals, D structure-dense tasks, and each dense task contains I data fragments. To represent the intelligent transportation system model more intuitively, this paper mathematically abstracts the entity units in the model.

In Table 1,

C I_{c l d}

is a tuple of characteristic information about the cloud server.

E C D S

is a collection of edge servers.

C I_{E C D S}

is a tuple of characteristic information about the edge server.

U D S

is a collection of vehicles.

C I_{U D S}

is a tuple of characteristic information about the vehicle.

D t a s k

is a collection of application tasks.

T K S

is a collection of data fragments.

C I_{T K S}

is a tuple of feature information for the data fragment.

P L

is a collection of data fragment privacy level segmentation hierarchies.

f_{c l o u d}^{c a l c}

represents the computing resources of the cloud server,

p_{c l o u d}^{t r a n s}

represents the transmission power of the cloud server,

f_{e c d_{n}}^{c a l c}

represents the computing power of edge servers,

s_{e c d_{n}}^{c a c h}

represents the caching resource of edge servers,

p_{e c d_{n}}^{c a l c}

represents the computing power of edge servers, and

l o c_{e c d_{n}}^{x, y}

represents the location information of edge servers.

f_{u s e r_{m}}^{c a l c}

is the computing resources of

u s e r_{m}

,

p_{u s e r_{m}}^{c a l c}

is the computing power of

u s e r_{m}

,

v_{u s e r_{m}}^{t r a n}

is the traveling speed of

u s e r_{m}

,

l o c_{u s e r_{m}}^{t (x, y)}

is the location information of

u s e r_{m}

at the moment of t,

d_{t a s k_{d, i}}^{d a v}

is the amount of data for the data fragment

t a s k_{d, i}

,

f_{t a s k_{d, i}}^{c a l c}

is the computational resources needed to compute

t a s k_{d, i}

,

s_{t a s k_{d, i}}^{c a c h}

is the cache space needed to cache

t a s k_{d, i}

, and

t_{t a s k_{d, i}}^{r e s t}

is the maximum tolerable delay for the computation of

t a s k_{d, i}

.

3.2. Privacy-Level Classification of Data Fragments

Privacy-sensitive data fragments can be kept in the user’s onboard computing unit for privacy protection, but this strategy is only suitable for small-scale tasks and may result in inefficient resource utilization for large task volumes. Therefore, in this work, data fragments with different privacy sensitivities are assigned different processing and offloading strategies according to their privacy levels. To enable fine-grained privacy-aware offloading, a Bayesian decision criterion is introduced to assign each data fragment to one of K predefined privacy levels before offloading. The privacy-level set is defined as

P L = {1, 2, \dots, K}

, where a larger level indicates stronger privacy sensitivity. In a vehicular context, low-level privacy data typically include ordinary sensing information and non-identifiable vehicle status data, medium-level privacy data may include location traces and driving behavior features, while high-level privacy data may include in-cabin video, facial information, biometric content, and payment-related records. In this paper, the Bayesian criterion serves as a front-end privacy-level assignment mechanism for subsequent protection and offloading decisions. The choice of K reflects a tradeoff between privacy granularity and classification complexity. Since misclassifying a highly sensitive fragment into a lower privacy level may lead to insufficient protection, high-level privacy fragments are handled more conservatively in the proposed framework.

Let

z_{d, i} \in R^{F}

denote the feature vector of fragment i in task d, where the extracted features may include identity relevance, location sensitivity, payment association, biometric or video sensitivity, and other semantic attributes related to privacy exposure. Let

y_{d, i} \in {1, 2, \dots, K}

denote the privacy-level label of fragment i, where a larger level indicates stronger privacy sensitivity. For each privacy level

k \in {1, 2, \dots, K}

, let

π_{k} = P (y = k)

denote the prior probability of class k, and let

p (z ∣ y = k)

denote the class-conditional probability density. According to Bayes’ rule, the posterior probability that fragment

z_{d, i}

belongs to privacy level k is given by

P (y = k ∣ z_{d, i}) = \frac{π_{k} p (z_{d, i} ∣ y = k)}{\sum_{j = 1}^{K} π_{j} p (z_{d, i} ∣ y = j)} .

(1)

Under the Bayesian decision criterion with 0–1 loss, the optimal classification rule is the maximum a posteriori rule:

{\hat{y}}_{d, i} = arg max_{k \in {1, \dots, K}} P (y = k ∣ z_{d, i}) .

(2)

Since the denominator in Equation (1) is identical for all classes, Equation (2) can be equivalently written as

{\hat{y}}_{d, i} = arg max_{k \in {1, \dots, K}} [ln π_{k} + ln p (z_{d, i} ∣ y = k)] .

(3)

In this paper, the feature vectors in each privacy class are assumed to follow a multivariate Gaussian distribution:

p (z_{d, i} ∣ y = k) = N (z_{d, i}; μ_{k}, Σ_{k}),

(4)

where

μ_{k} \in R^{F}

and

Σ_{k} \in R^{F \times F}

denote the mean vector and covariance matrix of privacy class k, respectively.

Substituting Equation (4) into Equation (3), the decision rule can be written as

{\hat{y}}_{d, i} = arg max_{k \in {1, \dots, K}} [ln π_{k} - \frac{1}{2} {(z_{d, i} - μ_{k})}^{T} Σ_{k}^{- 1} (z_{d, i} - μ_{k}) - \frac{1}{2} ln | Σ_{k} |] .

(5)

The parameters

μ_{k}

and

Σ_{k}

are estimated from the training samples in privacy class k.

μ_{k} = \frac{1}{N_{k}} \sum_{r = 1}^{N_{k}} z_{r}^{(k)},

(6)

Σ_{k} = \frac{1}{N_{k}} \sum_{r = 1}^{N_{k}} (z_{r}^{(k)} - μ_{k}) {(z_{r}^{(k)} - μ_{k})}^{T},

(7)

where

N_{k}

is the number of training fragments in class k, and

z_{r}^{(k)}

denotes the r-th training sample of privacy class k. The predicted privacy level

{\hat{y}}_{d, i}

is then used by the subsequent offloading strategy to determine differentiated protection and execution decisions for data fragments with different privacy sensitivities. In this way, the Bayesian classifier serves as a lightweight and interpretable front-end module for privacy-aware fragment-level offloading.

3.3. Communication Model

In this work, it is assumed that the central cloud server, edge servers, vehicle terminals, and other vehicles can provide users with task offloading services. The time during which vehicle terminals pass through the coverage areas of edge servers is divided into

T (t = {1, 2, \dots, t_{0}, \dots, T})

time slots, and the length of each time slot is

t_{1}

. When wireless communication occurs between the vehicles and other servers within the same time slot, the channel state is assumed to follow quasi-static flat Rayleigh fading. In neighboring time slots, the channel states are independent and time-varying. In addition, the wired links among collaborative edge servers are treated as relatively stable infrastructure connections with high transmission capacity. Therefore, to focus on the dominant latency components caused by wireless transmission and task computation, the delay and transmission rate of wired communication are not explicitly considered in this paper. This assumption is adopted for model tractability and may be less accurate in dense urban environments or under backhaul congestion. In the following communication-rate models, the bandwidth terms determine the rate units, while the ratios inside the logarithmic functions are interpreted as normalized effective SINR terms and are therefore dimensionless.

The communication rate

v_{c l o u d}^{u s e r}

between the user and the central cloud server is shown in Equation (8).

v_{c l o u d}^{u s e r} = \frac{B_{c l o u d}^{u s e r}}{c h_{c l o u d}^{u s e r}} {log}_{2} (1 + \frac{p_{t r a n s}^{u s e r} \times s}{σ^{2}})

(8)

where

B_{c l o u d}^{u s e r}

denotes the channel bandwidth between the user terminal and the cloud server,

c h_{c l o u d}^{u s e r}

denotes the number of channels between the user terminal and the cloud server, and s denotes a normalized channel attenuation factor for the user–cloud link. Therefore,

B_{c l o u d}^{u s e r} / c h_{c l o u d}^{u s e r}

represents the effective bandwidth allocated to each channel.

The data transfer rate

v_{m, n}^{v 2 i}

of the user terminal communicating with the edge server via the V2I communication method is expressed in Equation (9).

v_{m, n}^{v 2 i} = B_{m, n} {log}_{2} (1 + \frac{p_{t r a n s}^{u s e r} \times g_{m, n}^{v 2 i}}{σ^{2} + d_{m, n}})

(9)

where

B_{m, n}

denotes the channel bandwidth between the user terminal and the edge server,

g_{m, n}^{v 2 i}

denotes the normalized channel gain between the user terminal and the edge server,

σ^{2}

denotes the Gaussian white noise power [8], and

d_{m, n}

denotes the aggregate interference term on the V2I link.

p_{t r a n s}^{u s e r}

is the transmission power of the vehicle terminal.

The user uses V2V wireless communication with the data rate

v_{m, k}^{v 2 v}

, as shown in Equation (10).

v_{m, k}^{v 2 v} = B_{m, k} {log}_{2} (1 + \frac{p_{t r a n s}^{u s e r} \times g_{m, k}^{v 2 v}}{σ^{2} + {(L_{m, k})}^{- 2}})

(10)

where

B_{m, k}

denotes the communication bandwidth between user terminals,

g_{m, k}^{v 2 v}

denotes the normalized channel gain, and

L_{m, k}

denotes the Euclidean distance between users. Here,

{(L_{m, k})}^{- 2}

is used as an abstract distance-dependent attenuation term in the V2V communication model. Overall, the above communication parameters are introduced to characterize attenuation, interference, and distance effects in a tractable manner for offloading decision optimization, rather than to establish a complete physical-layer propagation model.

3.4. Edge Caching Model

The cache state of the collaborative edge server is represented by the caching matrix

S = {t a s k_{d, i} ∣ c_{(n, d, i)}}

. The binary variable

c_{(n, d, i)} = 1

indicates that

t a s k_{d, i}

has been cached at the collaborative edge server in a directly serviceable form; otherwise,

t a s k_{d, i}

is not cached.

The cache-service delay

t_{t a s k_{d, i}}^{c a c h e}

for obtaining

t a s k_{d, i}

from the collaborative edge server is shown in Equation (11).

t_{t a s k_{d, i}}^{c a c h e} = \{\begin{matrix} \frac{d_{t a s k_{d, i}}^{d a v}}{v_{m, n}^{v 2 i}}, & c_{(n, d, i)} = 1 \\ 0, & c_{(n, d, i)} = 0 \end{matrix}

(11)

3.5. Delay and Energy Mode

In this paper’s scenario, local vehicle terminals, edge servers, cloud servers, and other vehicles (parked vehicles and other moving vehicles) can provide task computation services to users. We consider the delay and energy consumption during the offloading process of intensive tasks in the intelligent transportation scenario. Therefore, to focus on the dominant overhead introduced by wireless transmission and computation execution, we ignore the backhauling delay after data fragment computation. Additionally, since a wired connection is used between the collaborative edge servers, we neglect the transmission delay of tasks among these servers [8]. This assumption is mainly adopted for model tractability and may be less accurate in dense urban environments or under constrained backhaul conditions.

Latency of data fragments processed locally

If the data fragment is computed locally, the computational delay

t_{t a s k_{d, i}}^{l o c a l}

is shown in Equation (12).

t_{t a s k_{d, i}}^{l o c a l} = \frac{f_{t a s k_{d, i}}^{c a l c}}{f_{u s e r_{m}}^{c a l c}}

(12)

The energy consumption of the data fragment for local execution

e_{t a s k_{d, i}}^{l o c a l}

is calculated as shown in Equation (13).

e_{t a s k_{d, i}}^{l o c a l} = p_{u s e r_{m}}^{c a l c} \times \frac{f_{t a s k_{d, i}}^{c a l c}}{f_{u s e r_{m}}^{c a l c}}

(13)

In the current model, the local execution delay is treated as the expected execution time under a given terminal computing capability. This abstraction captures the average local computation overhead of a data fragment. In practical vehicular environments, however, stochastic factors such as processor load variation, operating-system scheduling, and background-task contention may introduce fluctuations into the actual local service time. Such stochasticity may further affect the state-transition and reward distributions observed by the DRL agents, thereby influencing the robustness and generalization of the learned offloading policy.

Latency of data fragments computed at edge servers

The latency of data fragment execution at the edge server is divided into the transmission latency of sending the data fragment to the edge server and the computation latency of the data fragment at the edge server. When the cache state of the collaborative edge server is

c_{(n, d, i)} = 1

, the corresponding data fragment is assumed to be already available in a directly serviceable form, and thus no additional edge-side computation delay is incurred.

The computation time

t_{t a s k_{d, i}}^{e c d s}

and energy consumption

e_{t a s k_{d, i}}^{e c d s}

of the data fragment at the edge server are shown in Equations (14) and (15).

t_{t a s k_{d, i}}^{e c d s} = \{\begin{matrix} \frac{d_{t a s k_{d, i}}^{d a v}}{v_{m, n}^{v 2 i}} + \frac{f_{t a s k_{d, i}}^{c a l c}}{f_{e c d_{n}}^{c a l c}}, & c_{(n, d, i)} = 0 \\ \frac{d_{t a s k_{d, i}}^{d a v}}{v_{m, n}^{v 2 i}}, & c_{(n, d, i)} = 1 \end{matrix}

(14)

e_{t a s k_{d, i}}^{e c d s} = \{\begin{matrix} \frac{d_{t a s k_{d, i}}^{d a v}}{v_{m, n}^{v 2 i}} \times p_{t r a n s}^{u s e r} + \frac{f_{t a s k_{d, i}}^{c a l c}}{f_{e c d_{n}}^{c a l c}} \times p_{u s e r_{m}}^{c a l c}, & c_{(n, d, i)} = 0 \\ \frac{d_{t a s k_{d, i}}^{d a v}}{v_{m, n}^{v 2 i}} \times p_{t r a n s}^{u s e r}, & c_{(n, d, i)} = 1 \end{matrix}

(15)

Latency calculated by offloading data segments to other vehicle terminals

Considering the privacy of some data fragments, in this paper, when a task is offloaded to other vehicles (parked vehicles or other moving vehicles) for computation, only the data fragments that do not contain user privacy are offloaded to other vehicle terminals. In this case, the task computation time

t_{t a s k_{d, i}}^{v 2 v}

and energy consumption

e_{t a s k_{d, i}}^{v 2 v}

are shown in Equations (16) and (17).

t_{t a s k_{d, i}}^{v 2 v} = \frac{d_{t a s k_{d, i}}^{d a v}}{v_{m, k}^{v 2 v}} + \frac{f_{t a s k_{d, i}}^{c a l c}}{f_{u s e r_{k}}^{c a l c}}

(16)

e_{t a s k_{d, i}}^{v 2 v} = \frac{d_{t a s k_{d, i}}^{d a v}}{v_{m, k}^{v 2 v}} \times p_{t r a n s}^{u s e r} + \frac{f_{t a s k_{d, i}}^{c a l c}}{f_{u s e r_{k}}^{c a l c}} \times p_{u s e r_{k}}^{c a l c}

(17)

where

p_{u s e r_{k}}^{c a l c}

denotes the computing power of the target vehicle terminal.

Latency of data segments computed on central cloud servers

If

t a s k_{d, i}

is computed at the central cloud server, the computation delay

t_{t a s k_{d, i}}^{c l o u d}

and energy consumption

e_{t a s k_{d, i}}^{c l o u d}

are shown in Equations (18) and (19).

t_{t a s k_{d, i}}^{c l o u d} = \frac{d_{t a s k_{d, i}}^{d a v}}{v_{c l o u d}^{u s e r}} + \frac{f_{t a s k_{d, i}}^{c a l c}}{f_{c l o u d}^{c a l c}}

(18)

e_{t a s k_{d, i}}^{c l o u d} = \frac{d_{t a s k_{d, i}}^{d a v}}{v_{c l o u d}^{u s e r}} \times p_{t r a n s}^{u s e r} + \frac{f_{t a s k_{d, i}}^{c a l c}}{f_{c l o u d}^{c a l c}} \times p_{c l o u d}^{c a l c}

(19)

Let the total task offloading delay be

T (X)

and the total energy consumption be

E (X)

, as shown in Equations (20) and (21). For each data fragment, the incurred delay and energy consumption are determined by its selected execution mode. Therefore, the total delay and total energy should be understood as the aggregation of the corresponding overheads of all data fragments under their actual offloading decisions.

T (X) = \sum_{d = 1}^{D} \sum_{i = 1}^{I} (t_{t a s k_{d, i}}^{c a c h e} + t_{t a s k_{d, i}}^{l o c a l} + t_{t a s k_{d, i}}^{e c d s} + t_{t a s k_{d, i}}^{v 2 v} + t_{t a s k_{d, i}}^{c l o u d})

(20)

E (X) = \sum_{d = 1}^{D} \sum_{i = 1}^{I} (e_{t a s k_{d, i}}^{c a c h e} + e_{t a s k_{d, i}}^{l o c a l} + e_{t a s k_{d, i}}^{e c d s} + e_{t a s k_{d, i}}^{v 2 v} + e_{t a s k_{d, i}}^{c l o u d})

(21)

3.6. Privacy Data Fragment Protection Model

In this section, we consider the possibility that edge server nodes may be subject to monitoring or inference attacks, and that collaborative edge nodes may jointly infer user privacy through observed offloading preferences and received data fragments. To reduce the risk of privacy leakage, a perturbation mechanism based on differential privacy is applied to highly sensitive data fragments before offloading, and the target execution server is selected according to the privacy level of each fragment in a directional manner [26,27]. In the proposed framework, higher privacy levels correspond to more conservative protection strategies. In particular, fragments with higher privacy sensitivity are subject to stronger perturbation or stricter offloading constraints, since misclassifying a highly private fragment into a lower privacy level may lead to insufficient protection and increased privacy leakage risk. Differential privacy and privacy entropy play complementary roles in this paper: the former provides perturbation-based protection for highly sensitive fragments, while the latter characterizes the dispersion of private fragments across collaborative edge nodes. Therefore, privacy entropy is used as a heuristic indicator for dispersion-based protection rather than a formal privacy guarantee equivalent to differential privacy.

For highly sensitive data fragments, we adopt an

(ε, δ)

-differential privacy protection mechanism. Let two fragment datasets D and

D^{'}

be adjacent if they differ in at most one sensitive record of a single offloaded fragment. A randomized mechanism

M

satisfies

(ε, δ)

-differential privacy if, for any adjacent datasets D and

D^{'}

and any measurable output set

S

, the following condition holds:

Pr [M (D) \in S] \leq e^{ε} Pr [M (D^{'}) \in S] + δ .

(22)

For a fragment query or feature function

f (\cdot)

with

ℓ_{2}

-sensitivity

Δ_{f}

, the Gaussian mechanism releases a perturbed output

\tilde{f} (D) = f (D) + N (0, σ^{2} Δ_{f}^{2}),

(23)

where the noise scale

σ

is determined by the privacy parameters

(ε, δ)

. In general, a smaller

ε

or

δ

corresponds to stronger privacy protection and thus requires a larger perturbation scale. In the proposed framework, fragments with higher privacy levels are assigned stricter privacy budgets, while fragments with lower privacy sensitivity are assigned relatively looser budgets in order to preserve service utility.

Assume that the vehicle terminal offloads private data fragments whose privacy level is no lower than level k (where

k \geq 1

) to edge servers. Let

d_{load}^{k}

denote the total amount of private data at privacy level k, and let

d_{load}^{k, n}

denote the amount of level-k private data offloaded to edge server

e c d_{n}

. Then, the total amount of offloaded private data is

d_{load}^{sum} = \sum_{k = 1}^{K} \sum_{n = 1}^{N} d_{load}^{k, n} .

(24)

The privacy entropy of edge server

e c d_{n}

at time slot t is defined as

Θ_{n}^{t} = - \sum_{k = 1}^{K} k p_{l o a d}^{k, n} {log}_{2} (p_{l o a d}^{k, n}),

(25)

where

p_{l o a d}^{k, n}

denotes the proportion of level-k private data fragments offloaded to edge server

e c d_{n}

. The privacy entropy is used to characterize the dispersion of privacy-sensitive fragments across collaborative edge nodes. A higher value of

Θ_{n}^{t}

indicates that private fragments are more evenly distributed among edge nodes, which makes aggregation-based inference more difficult. The linear weight k reflects the higher importance of fragments with higher privacy levels. This metric mainly measures fragment dispersion and should therefore be interpreted as a heuristic risk indicator rather than a formal privacy guarantee. In repeated offloading scenarios, the privacy budget is consumed cumulatively over time, and a stronger perturbation improves privacy protection but may also reduce data utility and indirectly affect latency and energy performance. For this reason, the proposed method jointly considers perturbation-based protection together with latency, energy consumption, and privacy entropy in the subsequent optimization process [16].

3.7. Multi-Objective Optimization Problem Mode

In this paper, the privacy-preserving data offloading problem in the intelligent transportation cloud–edge–end collaborative scenario is formulated as a multi-objective optimization problem. The formulation jointly considers system latency, energy consumption, and privacy protection. In particular, latency and energy characterize the service efficiency of task execution, while privacy protection is reflected by both the dispersion of sensitive fragments across collaborative nodes and the perturbation mechanism applied to highly sensitive fragments. Since the resulting optimization problem involves coupled offloading decisions, resource constraints, and privacy-aware protection mechanisms, it is difficult to solve in polynomial time and is therefore regarded as an NP-hard problem. In addition, the differential-privacy-based perturbation applied to highly sensitive fragments introduces a utility–privacy tradeoff. Stronger perturbation can improve privacy protection, but it may also reduce the utility of the offloaded data and indirectly affect task completion efficiency, thereby influencing latency and energy consumption. Therefore, in order to satisfy the demand for high-quality intelligent transportation services while strengthening the protection of user privacy, this paper takes minimizing latency, minimizing energy consumption, and maximizing privacy entropy as the optimization objectives. Here, the latency and energy terms denote the overall overhead incurred by all data fragments under their actual offloading decisions, while the privacy entropy characterizes the privacy-preservation effect of fragment dispersion and perturbation. The resulting multi-objective optimization problem is defined as follows:

\begin{matrix} min T, min E, max Θ_{n}^{t} \\ s . t . \\ C 1 : s_{t a s k_{d, i}}^{cache} \leq s_{e c d_{n}}^{cache}, \forall n \in N, \forall d \in D, \forall i \in I, \\ C 2 : f_{t a s k_{d, i}}^{calc} \leq f_{u s e r_{m}}^{calc}, \forall d \in D, \forall i \in I, \forall m \in M, \\ C 3 : f_{t a s k_{d, i}}^{calc} \leq f_{e c d_{n}}^{calc}, \forall n \in N, \forall d \in D, \forall i \in I, \\ C 4 : t_{t a s k_{d, i}}^{rest} \leq σ, \forall d \in D, \forall i \in I . \end{matrix}

(26)

where Constraint C1 specifies that the cache resources required for storing any data fragment must not exceed the maximum cache capacity of the edge server. Constraint C2 indicates that the computational resources required for processing any data fragment locally must not exceed the maximum computational capability of the vehicle terminal. Constraint C3 states that the computational resources required for processing any data fragment at the edge server must not exceed the maximum computational capability of the edge server. Constraint C4 specifies that the tolerable response delay of each data fragment must not exceed the system latency threshold

σ

. These constraints together define the feasible decision region of the privacy-aware offloading problem. In the subsequent DRL framework, Constraints C1–C4 are enforced through feasibility checking and penalty regulation. If an action violates any of these constraints, it is treated as infeasible and receives a strong negative reward, so that invalid decisions are suppressed during policy learning.

4. Deep Reinforcement Learning Algorithm Design

In this paper, we address the continuous time-varying dynamics of collaborative cloud-edge-end systems in intelligent transportation. We formulate the problem via a stochastic game theoretic model and employ a deep reinforcement learning algorithm to find solutions. We adopt a multi-agent stochastic game framework, enabling shared learning strategies aimed at maximizing system benefits. In this section, we propose a centralized training approach for multiple intelligent agents termed AC (CTMA-AC), a deep reinforcement learning offloading strategy built on the actor–critic algorithm. To mitigate the impact of suboptimal learning strategies from other intelligent agents on the transportation system, this paper uploads learning information from all the intelligent agents to a shared experience pool. The information in this pool is utilized for collective training, after which the trained strategies are distributed to each agent’s neural network [27,28]. The number of intelligent agents, state space, action space, state transfer probability and reward function of the model are denoted by the 5-tuple

< M, S, A, P, R >

where

M = [1, 2, . . ., m, . . ., M]

,

S = [s_{1}, s_{2}, . . ., s_{m}, . . ., s_{M}]

,

A = [a_{1}, a_{2}, . . ., a_{m}, . . ., a_{M}]

,

R = [R_{1}, R_{2}, . . ., R_{m}, . . ., R_{M}]

. The state transfer probability P satisfies:

S \times A \times S \to [0, 1]

.

The network framework of the CTMA-AC algorithm is shown in Figure 2.

4.1. CTMA-AC Algorithmic

This paper develops the CTMA-AC algorithm based on the AC framework to improve the collaborative decision-making capability in intelligent transportation systems with multiple users. To adapt to the cloud–edge–end collaborative offloading scenario, a multi-agent deep reinforcement learning framework is introduced, in which each intelligent agent maintains its own actor network and critic network and optimizes its policy through repeated interaction with the environment. In this way, collaborative learning among multiple agents in a dynamic intelligent transportation environment can be achieved. In addition, a global experience pool is constructed to accelerate learning by integrating the experience samples collected by all agents. Constraints C1–C4 are enforced during policy learning through feasibility checking and penalty regulation. Specifically, if an action violates cache-capacity, computation-capacity, or delay constraints, it is treated as infeasible and receives a strong negative penalty in the reward design, so that invalid decisions are suppressed during training. In this way, the learned policy is encouraged to remain within the feasible decision region defined by the optimization constraints. In this paper, the problem is modeled as an approximately fully observed Markov decision process (MDP) game. This assumption is based on the fact that, at each decision epoch, the scheduler can obtain the current task information, privacy level, vehicular position, and the communication and computation status required for decision making. Nevertheless, future task arrivals and partially hidden resource fluctuations may introduce partial observability in more realistic environments. Such partially observable extensions will be considered in future work. Furthermore, an experience replay mechanism and a soft target-network update strategy are adopted to improve the stability of policy learning and enhance sample efficiency. The proposed CTMA-AC algorithm is shown in Algorithm 1.

Algorithm 1 CTMA-AC algorithmic

Require:: input Intelligent Transportation Environmental Parameters
Ensure:: output Offloading strategy for private user data segments
1:: Initialization:
2:: for each intelligent agent m in M do
3:: Initialize actor network $π_{m}^{*}$
4:: Initialize critic network $Q_{m}$
5:: Initialize target networks
6:: end for
7:: Training:
8:: for episode $e \in {1, \dots, K}$ do
9:: All agents observe initial state $s_{t_{0}} = {s_{1}^{t_{0}}, s_{2}^{t_{0}}, \dots, s_{M}^{t_{0}}}$
10:: for each time slot t do
11:: for each agent do
12:: Select action $a_{t}^{m}$ based on current policy $π_{m}^{*}$
13:: Execute action $a_{t}^{m}$
14:: Receive reward $R_{t}^{m}$
15:: Observe next state $s_{t + 1}^{m}$
16:: Store $(s_{t}^{m}, a_{t}^{m}, R_{t}^{m}, s_{t + 1}^{m})$
17:: end for
18:: end for
19:: For global experience pool:
20:: Collect locally uploaded samples and combine into global experience pool
21:: Sample mini-batch transitions from global experience pool for updates
22:: Update the critic network parameters by minimizing the loss function
23:: Update the actor network parameters
24:: Soft update target networks
25:: Send latest actor network parameters to each agent
26:: end for
27:: return offloading strategy for private user data segments

State space: At decision epoch t, the local state of agent m is composed of the current task status, the location information of the vehicle terminal, and the privacy level of the generated data fragment, i.e.,

s_{t}^{m} = (d_{t}^{m}, l o c_{c a r_{m}}^{(x, y)} (t), k_{t}^{m}) .

(27)

where

d_{t}^{m}

denotes the task generated by agent m at time slot t,

l o c_{c a r_{m}}^{(x, y)} (t)

denotes the position of vehicle terminal m, and

k_{t}^{m}

denotes the privacy level of the corresponding data fragment.

Action space: The action of agent m at time slot t is defined as

a_{t}^{m} = (d_{t}^{m (ecds)}, d_{t}^{m (cloud)}, d_{t}^{m (c a r_{m})}, p_{t}^{m (cal)}),

(28)

where

d_{t}^{m (ecds)}

,

d_{t}^{m (cloud)}

, and

d_{t}^{m (c a r_{m})}

denote the portions of the task offloaded to edge servers, cloud servers, and local execution, respectively, and

p_{t}^{m (cal)}

denotes the corresponding computation-resource allocation decision.

State transition: The system dynamics are jointly determined by the current state, the selected action, and stochastic environmental factors. Specifically, the state transition is written as

s_{t + 1}^{m} = T (s_{t}^{m}, a_{t}^{m}, ω_{t}^{m}),

(29)

where

ω_{t}^{m}

denotes the random environmental elements at time slot t, including task arrivals, wireless channel realizations, vehicle mobility, and edge-server load variations. Therefore, the next state depends not only on the current offloading and resource-allocation decision, but also on the stochastic evolution of the communication and computation environment.

Reward function: This paper designs the system reward value based on latency, energy consumption, and the privacy entropy of the k-level private data during offloading in the intelligent transportation cloud–edge–end cooperative system. Since these quantities have different physical units, normalization operations are adopted to normalize the objective function. The normalization of delay at time slot t is given by

N^{t} (T (x)) = \frac{T (x) - T_{min}}{T_{max} - T_{min}} .

(30)

where

T_{max}

and

T_{min}

are the maximum and minimum delay values of the system, respectively.

The normalization of energy consumption at time slot t is given by

N^{t} (E (x)) = \frac{E (x) - E_{min}}{E_{max} - E_{min}} .

(31)

where

E_{max}

and

E_{min}

are the maximum and minimum energy consumption values of the system, respectively.

Considering that different user groups may have different service requirements, this paper adopts

δ_{1}

,

δ_{2}

, and

δ_{3}

as weighting factors to balance latency, energy consumption, and privacy entropy. In the DRL formulation, this corresponds to a linear weighted-sum scalarization of the original multi-objective optimization problem, which is adopted for its simplicity, training stability, and ability to flexibly reflect different service priorities. The reward function of agent m at time slot t is defined as

R_{m}^{t} = δ_{3} Θ_{n}^{t} - δ_{1} N^{t} (T (x)) - δ_{2} N^{t} (E (x)) .

(32)

Therefore, the total system reward at time slot t is

R^{t} = \sum_{m = 1}^{M} R_{m}^{t} .

(33)

Let

γ

be the reward discount factor. The optimal offloading policy

π^{*}

based on optimal action selection is defined as

π^{*} = arg max_{a} [\sum_{t = 1}^{\infty} γ^{t - 1} R^{t}] .

(34)

In this paper, the total reward obtained from the multi-agent stochastic game is used to jointly optimize delay, energy consumption, and K-level privacy entropy. Each intelligent agent selects its action by considering both its own strategy and its interaction with other agents, so as to obtain a privacy-aware and resource-efficient offloading policy.

4.2. Algorithm Complexity Analysis

This section analyzes the complexity of the CTMA-AC algorithm. The main computational cost of the proposed algorithm is related to the neural-network structure of the actor–critic framework, the dimensions of the input and output layers, the number of intelligent agents, and the number of training episodes. Therefore, the complexity of the proposed CTMA-AC algorithm is expressed as

O ({(o p t + n e u_{i n} \times d i m_{i n} + n e u_{o u t} \times d i m_{o u t})}^{2} M^{2} \times e p o)

(35)

where

o p t

denotes the computational cost associated with network parameter optimization,

n e u_{i n}

and

n e u_{o u t}

denote the numbers of neurons in the input and output layers, respectively,

d i m_{i n}

and

d i m_{o u t}

denote the corresponding input and output dimensions of the neural network, M denotes the number of intelligent agents (vehicle terminals), and

e p o

denotes the number of training episodes. In the experimental setting of this paper,

M = 10

and

e p o = 500

, while both the actor and critic networks adopt multilayer perceptrons with two hidden layers of 128 neurons each. Therefore, under the current experimental scale, the main computational cost of CTMA-AC mainly comes from multi-agent interaction, neural-network parameter updates, and repeated iterative training. According to Equation (35), the training complexity increases with the network scale, the number of agents, and the number of training episodes. In addition, as the numbers of vehicles, task fragments, and candidate edge nodes increase, the state and action spaces become larger, which further increases the training overhead of CTMA-AC.

5. Simulation Experiment and Analysis

To verify the effectiveness of the proposed scheme in this paper for offloading private user data in intelligent transportation scenarios, this section evaluates the performance of the CTMA-AC offloading strategy. The evaluation involves comparing it with several other strategies: soft actor-critics (SAC), full-local offloading (Full-Local), deep Q-network (DQN), and random offloading.

1.: Soft actor–critic strategy (SAC) [29]: The core idea is to improve the learning efficiency and stability of the strategy and maximize the entropy of the strategy while optimizing the cumulative rewards to maintain exploratory behavior during the training process. The SAC strategy in intelligent transportation scenarios allows tasks to be offloaded to the central cloud servers, edge servers, and other service entities.
2.: Full-Local offloading strategy (Full-Local) [30]: The execution of tasks is completely local, the computation process does not depend on cloud servers or edge servers, and good security performance is achieved because of the fast mobility of vehicles.
3.: Deep Q-network offloading strategy (DQN) [31]: Combining Q-learning and deep neural networks can lead to efficient learning in high-dimensional state space, and the DQN offloading strategy in intelligent transportation scenarios can offload tasks to service entities such as central cloud servers and edge servers.
4.: Random offloading policy (Random offloading) [30]: Offloading tasks to other service units, such as onboard, central cloud servers, and near-edge servers, via random selection.

5.1. Experimental Setup

In the simulation experiment, this paper assumes that the vehicle travels at a constant speed of 20 m per second on a 10,000-m road, the scene is equipped with a central cloud server, four edge servers are placed on both sides of the road, the number of users set for the experiment is 10, and considering interference factors such as noise and obstacles in the urban system, the WLAN carrier frequency is the 5 GHz frequency with strong antijamming capability. The settings of the remaining important simulation parameters used in this paper are shown in Table 2 [30,32]. Unless otherwise stated, the values listed in Table 2 are representative simulation settings for the considered intelligent transportation scenario. In particular, the parameters measured in MIPS are used as abstract indicators of relative computing capability in the simulation, rather than exact processor clock-rate specifications. In addition, for the proposed CTMA-AC algorithm, both the actor network and the critic network adopt the same multilayer perceptron architecture, consisting of two hidden layers with 128 neurons in each layer, and ReLU is used as the activation function. The network parameters are optimized using Adam, with the learning rates of both the actor and critic networks set to

3 \times 10^{- 4}

. The batch size is set to 64, the target-network soft update rate is set to

τ = 0.005

, and the replay buffer size is set to

10^{6}

. Furthermore, the total number of training episodes is set to 500, and each training cycle contains 80 time steps. The convergence judgment follows the criterion described in the manuscript. For fairness of comparison, the learning-based baselines, including SAC and DQN, are trained under the same simulation scenario as the proposed CTMA-AC method, with comparable state spaces, action spaces, replay-based training mechanisms, and training budgets. Their hyperparameter settings follow commonly used configurations in the reinforcement-learning literature and are reasonably adapted to the task-offloading scenario considered in this paper. In addition, the Full-Local and Random baselines share the same communication model, computation model, and task scenario as CTMA-AC, and differ only in their offloading decision rules. Therefore, the reported performance differences mainly reflect the effectiveness of different offloading strategies rather than inconsistencies in the underlying system model or training conditions.

5.2. Analysis of Experimental Results

In this section, we verify the effectiveness of the privacy data offloading protection strategy proposed in this paper through simulation experiments. We analyse the convergence of the CTMA-AC scheme and the sensitivity of its weighting parameters through experiments to demonstrate its feasibility and stability. Next, we conduct comprehensive comparative experimental analyses of the system delay, energy consumption, and K-level privacy entropy of the data under controlled experimental conditions. Finally, we draw valid experimental conclusions on the basis of these analyses.

Table 3 shows the sensitivity analysis of CTMA-AC under different weighting settings. It can be observed that

δ_{1}

,

δ_{2}

, and

δ_{3}

significantly affect the tradeoff among latency, energy consumption, and privacy entropy. When

δ_{3}

is small, the system tends to prioritize latency and energy efficiency, resulting in lower latency and energy consumption. As

δ_{3}

increases, the privacy entropy also increases, indicating improved privacy-related performance, but at the cost of higher latency and energy consumption. In practical intelligent transportation applications, the values of

δ_{1}

,

δ_{2}

, and

δ_{3}

should be selected according to specific service priorities. For safety-critical or delay-sensitive services, larger

δ_{1}

and

δ_{2}

are preferred, whereas for privacy-sensitive services, a larger

δ_{3}

can be adopted to place greater emphasis on privacy protection.

Figure 3 shows that the evaluation reward fluctuates around a relatively low level before round 360, indicating that the model is still in the randomized exploration stage and has not yet reached stable performance. Around round 360, the reward increases sharply, which suggests that the training begins to benefit from the learned policy after the exploration phase. After reaching a peak around rounds 390–400, the reward slightly decreases and then stabilizes, indicating that the model gradually converges to a relatively stable performance level. Overall, although some fluctuations remain after round 360, the performance is clearly better than that in the earlier training stage.

Figure 4 shows that the latency of all schemes increases as the user task computation load grows. For task sizes of 500–600, the proposed scheme reduces latency by approximately 58.4%, 44.7%, and 89.0% compared with SAC, DQN, and Random offloading, respectively. Compared with the Full-Local scheme, the proposed method also achieves substantially lower latency; specifically, the latency of the Full-Local scheme is about 2.721 times that of the proposed scheme. This improvement is mainly attributed to the collaborative multi-agent training mechanism of CTMA-AC, which enables more effective policy sharing and a more thorough exploration of the offloading solution space.

As shown in Figure 5, the energy consumption of each scheme tends to increase as the number of computational tasks increases. When the number of tasks is small, the difference in energy consumption between the CTMA-AC scheme and the SAC scheme is not obvious. As the number of tasks increases, at 900–1000 tasks, the energy consumption of the scheme proposed in this paper is lower than that of the SAC scheme by about 4.7%, lower than that of the all-local scheme by about 43.6%, lower than that of the DQN scheme by about 13.9%, and lower than that of the random offloading scheme by about 41.5%, which is attributed to the fact that at a smaller number of tasks, different servers can accomplish computation more easily, thus generating similar energy consumption.

As shown in Figure 6, the difference between the K-level privacy entropy value of the CTMA-AC scheme and that of the fully locally computed K-level privacy entropy value is very small. However, as demonstrated by the experiments in the previous section, the latency and energy consumption of the scheme proposed in this paper are much lower than that of the local offloading scheme. This shows that the scheme proposed in this paper can maximize the protection of users’ private information while satisfying their service experience.

As shown in Figure 7, the latency of each optimization scheme tends to increase as the number of users increases. However, the CTMA-AC scheme still has the lowest latency consumption compared to the other schemes. It is worth noting that, since the system model in this paper considers the V2V communication mode, i.e., vehicle users can transmit certain tasks to other vehicle terminals for computation, the growth rate of latency consumption for the CTMA-AC scheme, the SAC scheme, and the DQN scheme is much smaller than that for the all-local offloading scheme and the random offloading scheme, although the number of users tends to increase significantly.

As shown in Figure 8, the energy consumption of all the optimization schemes tends to increase with the increase in the number of users. However, the energy consumption of the CTMA-AC scheme increases gradually. This is because while an increase in the number of users leads to more computational tasks, it also increases the number of collaborative intelligent agents, which leads to a fuller exploration of the solution space. Therefore, this paper concludes that the proposed scheme can effectively adapt to the limited growth in the number of vehicle users in ITS.

Figure 9 shows that the gap between the data privacy security of the CTMA-AC scheme and that of the task-in-full-local-computing scheme is very small, and the data privacy security value of the CTMA-AC scheme changes steadily with increasing number of users in the intelligent transportation system, which indicates that the proposed scheme in this paper can still stably protect the privacy of the user’s data even in the environment of multiple users.

From Figure 10, it can be observed that, as the privacy-level hierarchy of user data becomes more refined, the privacy entropy values of all schemes except the random offloading scheme show an increasing trend. This indicates that the proposed method maintains good adaptability under multi-level privacy partitioning and can still effectively support privacy-aware offloading decisions in scenarios with heterogeneous privacy sensitivities. At the same time, the results also suggest that a larger number of privacy levels can provide finer-grained privacy differentiation, which is beneficial for improving the dispersion of sensitive fragments across collaborative nodes. However, in practice, an excessively large K may also increase classification complexity and potential estimation error, whereas a smaller K reduces classification overhead but weakens the precision of differentiated privacy protection. Therefore, the choice of K should balance privacy granularity and classification cost according to the application requirements.

Figure 11 shows that with increasing maximum response delay, the offloading failure rate of each scheme except the random offloading scheme tends to decrease, and the offloading failure rate of the scheme proposed in this paper is the lowest. When the maximum response delay is 1 s, the average task failure rate of the proposed scheme is 5.2% lower than that of the SAC scheme, 62.5% lower than that of the full-local scheme, 12.7% lower than that of the DQN scheme, and 36.4% lower than that of the random offloading scheme, which indicates that the proposed scheme in this paper can satisfy the user’s service requirements well within the acceptable latency and is suitable for delay-sensitive Telematic intelligent transportation scenarios.

6. Conclusions

In recent years, edge-computing-supported intelligent transportation systems have developed rapidly, but a series of challenges related to traffic safety and user service quality have also emerged. To address the problems of service quality, data privacy, and data security in intelligent vehicular networks, this paper achieves multi-objective optimization of user privacy data offloading based on fine-grained privacy-level division of user data and differentiated privacy protection. By constructing a communication model, a caching model, a latency model, an energy-consumption model, and a K-level privacy-entropy model, the proposed scheme is validated through comparative experiments. Although the proposed framework has achieved promising results in the considered cloud–edge–end collaborative vehicular scenario, its performance under more diverse environmental settings still deserves further investigation.

In future work, we will extend the evaluation to different channel-noise conditions, vehicle speeds, and dynamic environment settings to further examine the robustness and generalization capability of the proposed method. We will also further investigate the overall problem of resource scheduling and data-traffic management planning in intelligent transportation systems so as to make the model closer to real intelligent transportation scenarios. In addition, we will also consider the monetization of edge-computing applications.

Author Contributions

Conceptualization, Y.Y. and Z.S.; methodology, Y.Y.; software, S.Z.; validation, Y.Y., S.Z. and Q.Z.; formal analysis, Q.Z.; investigation, Z.S.; resources, Y.Y.; data curation, S.Z.; writing—original draft preparation, S.Z.; writing—review and editing, Y.Y.; visualization, Q.Z.; supervision, S.Z.; project administration, Y.Y.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation Project of China (62172457), the Tianjin Natural Science Foundation Project (22JCZDJC00600).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that they have no confict of interest.

References

Zeng, J.; Gou, F.; Wu, J. Task offloading scheme combining deep reinforcement learning and convolutional neural networks for vehicle trajectory prediction in intelligent cities. Comput. Commun. 2023, 208, 29–43. [Google Scholar] [CrossRef]
Llorens-Carrodeguas, A.; Cervelló-Pastor, C.; Valera, F. DQN-based intelligent controller for multiple edge domains. J. Netw. Comput. Appl. 2023, 218, 103705. [Google Scholar] [CrossRef]
Njoku, J.N.; Nwakanma, C.I.; Amaizu, G.C.; Kim, D.S. Prospects and challenges of Metaverse application in data-driven intelligent transportation systems. IET Intell. Transp. Syst. 2023, 17, 1–21. [Google Scholar] [CrossRef]
Antevski, K.; Bernardos, C.J. Applying Blockchain consensus mechanisms to Network Service Federation: Analysis and performance evaluation. Comput. Netw. 2023, 234, 109913. [Google Scholar] [CrossRef]
Errounda, F.Z.; Liu, Y. Adaptive differential privacy in vertical federated learning for mobility forecasting. Future Gener. Comput. Syst. 2023, 149, 531–546. [Google Scholar] [CrossRef]
Shen, X.; Luo, X.; Wang, B.; Chen, Y.; Tang, D.; Gao, L. Privacy-preserving multiparty deep learning based on homomorphic proxy re-encryption. J. Syst. Archit. 2023, 144, 102983. [Google Scholar] [CrossRef]
Dai, X. Task Offloading for Cloud-Assisted Fog Computing With Dynamic Service Caching in Enterprise Management Systems. IEEE Trans. Ind. Inform. 2023, 19, 662–672. [Google Scholar] [CrossRef]
Zhu, S.; Song, Z.; Huang, C.; Qiao, R.; Zhu, H. Cloud-edge-end collaborative caching and UAV-assisted offloading decision based on the fusion of deep reinforcement learning algorithms. Artif. Intell. Rev. 2025, 58, 408. [Google Scholar] [CrossRef]
Heo, G.; Doh, I. Blockchain and differential privacy-based data processing system for data security and privacy in urban computing. Comput. Commun. 2024, 222, 161–176. [Google Scholar] [CrossRef]
Zhang, H.; Cao, L.; Kumar, N.; Zhang, J.; Zhang, P.; Wang, J. An improved DDPG-based privacy sensitive level protection computation offloading method in mobile edge computing. Future Gener. Comput. Syst. 2024, 159, 522–532. [Google Scholar] [CrossRef]
Wang, S.; Li, J.; Wu, G.; Chen, H.; Sun, S. Joint Optimization of Task Offloading and Resource Allocation Based on Differential Privacy in Vehicular Edge Computing. IEEE Trans. Comput. Soc. Syst. 2022, 9, 109–119. [Google Scholar] [CrossRef]
Mahmood, A.; Hong, Y.; Ehsan, M.K.; Mumtaz, S. Optimal Resource Allocation and Task Segmentation in IoT Enabled Mobile Edge Cloud. IEEE Trans. Veh. Technol. 2021, 70, 13294–13303. [Google Scholar] [CrossRef]
Jebreel, N.M.; Domingo-Ferrer, J.; Blanco-Justicia, A.; Sánchez, D. Enhanced Security and Privacy via Fragmented Federated Learning. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 6703–6717. [Google Scholar] [CrossRef]
Samy, A.; Elgendy, I.A.; Yu, H. Secure Task Offloading in Blockchain-Enabled Mobile Edge Computing With Deep Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4872–4887. [Google Scholar] [CrossRef]
Bai, F.; Shen, T.; Yu, Z. Trustworthy Blockchain-Empowered Collaborative Edge Computing-as-a-Service Scheduling and Data Sharing in the IIoE. IEEE Internet Things J. 2022, 9, 14752–14766. [Google Scholar] [CrossRef]
Wu, G.; Chen, X.; Gao, Z.; Zhang, H.; Yu, S.; Shen, S. Privacy-preserving offloading scheme in multiaccess mobile edge computing based on MADRL. J. Parallel Distrib. Comput. 2024, 183, 104775. [Google Scholar] [CrossRef]
Yang, M.; Tjuawinata, I.; Lam, K.Y. K-Means Clustering With Local Privacy for Privacy-Preserving Data Analysis. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2524–2537. [Google Scholar] [CrossRef]
Ye, D.; Shen, S.; Zhu, T. One Parameter Defense-Defending Against Data Inference Attacks via Differential Privacy. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1466–1480. [Google Scholar] [CrossRef]
Chen, X.; Hu, X.; Li, Y.; Tang, Q. Optimization of Privacy Budget Allocation In Differential Privacy-Based Public Transit Trajectory Data Publishing for Intelligent Mobility Applications. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15158–15168. [Google Scholar] [CrossRef]
Hu, J. Shield Against Gradient Leakage Attacks: Adaptive Privacy-Preserving Federated Learning. IEEE/ACM Trans. Netw. 2024, 32, 1407–1422. [Google Scholar] [CrossRef]
Zhang, G.; Liu, B.; Zhu, T.; Ding, M.; Zhou, W. PPFed: A Privacy-Preserving and Personalized Federated Learning Framework. IEEE Internet Things J. 2024, 11, 19380–19393. [Google Scholar] [CrossRef]
Zhou, W.; Zhu, T.; Ye, D.; Ren, W.; Choo, K.K. A Concurrent Federated Reinforcement Learning for IoT Resources Allocation With Local Differential Privacy. IEEE Internet Things J. 2024, 11, 6537–6550. [Google Scholar] [CrossRef]
Rezaeibagha, F.; Mu, Y.; Huang, K. Authenticable Additive Homomorphic Scheme and its Application for MEC-Based IoT. IEEE Trans. Serv. Comput. 2023, 16, 1664–1672. [Google Scholar] [CrossRef]
Gao, W.; Yu, W.; Liang, F.; Hatcher, W.G.; Lu, C. Privacy-Preserving Auction for Big Data Trading Using Homomorphic Encryption. IEEE Trans. Netw. Sci. Eng. 2020, 7, 776–791. [Google Scholar] [CrossRef]
Zhang, P. Privacy-Preserving and Outsourced Multi-Party K-Means Clustering Based on Multi-Key Fully Homomorphic Encryption. IEEE Trans. Dependable Secur. Comput. 2023, 20, 2348–2359. [Google Scholar] [CrossRef]
Liu, M.; Song, X.; Li, Y.; Li, W. Correlated differential privacy based logistic regression for supplier data protection. Comput. Secur. 2024, 136, 03542. [Google Scholar] [CrossRef]
Li, J.; Yang, Y.; He, Z.; Wu, H.; Shi, H.; Chen, W. Cournot policy model: Rethinking centralized training in multiagent reinforcement learning. Inf. Sci. 2024, 677, 120983. [Google Scholar] [CrossRef]
Zhang, Y.; Zhou, Y.; Lu, H.; Fujita, H. Cooperative multiagent actor-critic control of traffic network flow based on edge computing. Future Gener. Comput. Syst. 2021, 123, 128–141. [Google Scholar] [CrossRef]
Zhu, S.; Song, Z.; Zhu, H.; Qiao, R. Efficient slicing scheme and cache optimization strategy for structured dependent tasks in intelligent transportation scenarios. Ad Hoc Netw. 2025, 168, 103699. [Google Scholar]
Hersi, A.H.; Udayan, J.D. Efficient and Robust Multirobot Navigation and Task Allocation Using Soft Actor Critic. Procedia Comput. Sci. 2024, 235, 484–495. [Google Scholar] [CrossRef]
Zhu, S.; Tian, X.; Zhang, Z.; Qiao, R.; Zhu, H. Content Placement and Edge Collaborative Caching Scheme Based on Deep Reinforcement Learning for Internet of Vehicles. IEEE Trans. Intell. Transp. Syst. 2025, 26, 8050–8064. [Google Scholar] [CrossRef]
Zhu, S.; Liu, C.; Zhu, H.; Chen, H.; Qiao, R.; Wu, X.Y. DRL-based structured task offloading decision in intelligent transportation scenarios. Appl. Soft Comput. 2025, 171, 112770. [Google Scholar] [CrossRef]

Figure 1. Three-tier communication architecture for cloud-edge-end collaboration.

Figure 2. CTMA-AC network framework.

Figure 3. Average incentives for the CTMA-AC program.

Figure 4. Comparison of the latency of the schemes with different task sizes.

Figure 5. Comparison of the energy consumption of the schemes with different task sizes.

Figure 6. Comparison of the data privacy security of each scheme under different task sizes.

Figure 7. Comparison of delays of each scheme with different numbers of users.

Figure 8. Comparison of the energy consumption of various programs with different numbers of users.

Figure 9. Comparison of the data privacy security of each scheme with different numbers of users.

Figure 10. Comparison of the K-level privacy entropy of each scheme under different privacy levels.

Figure 11. Comparison of the task offloading failure rates of each scheme with different response latencies.

Table 1. Model parameter representation and interpretation.

Math Abstraction	Representation of Parameters
$C I_{cld}$	$C I_{cld} = {f_{c l o u d}^{c a l c}, p_{c l o u d}^{c a l c}, p_{c l o u d}^{t r a n s}}$
$E C D S$	$E C D S = {e c d_{1}, \dots, e c d_{n}, \dots, e c d_{N}}$
$C I_{ECDS}$	$C I_{ECDS} = {f_{e c d_{n}}^{c a l c}, s_{e c d_{n}}^{c a c h}, p_{e c d_{n}}^{c a l c}, l o c_{e c d_{n}}^{(x, y)}}$
$U D S$	$U D S = {u s e r_{1}, \dots, u s e r_{m}, \dots, u s e r_{M}}$
$C I_{UDS}$	$C I_{UDS} = {f_{u s e r_{m}}^{c a l c}, p_{u s e r_{m}}^{c a l c}, v_{u s e r_{m}}^{t r a n s}, l o c_{u s e r_{m}}^{t (x, y)}}$
$D_{task}$	$D_{task} = {D_{{task}_{1}}, \dots, D_{{task}_{d}}, \dots, D_{{task}_{D}}}$
$T S K$	$T S K = {t a s k_{d, 1}, \dots, t a s k_{d, i}, \dots, t a s k_{d, I_{d}}}$
$C I_{TSK}$	$C I_{TSK} = {d_{t a s k_{d, i}}^{d a v}, f_{t a s k_{d, i}}^{c a l c}, s_{t a s k_{d, i}}^{c a c h}, t_{t a s k_{d, i}}^{r e s t}}$
$P L$	$P L = {1, \dots, k, \dots, K}$
$z_{d, i}$	Feature vector of fragment i in task d
$y_{d, i}$	True privacy-level label of fragment i in task d
${\hat{y}}_{d, i}$	Predicted privacy level of fragment i in task d
$π_{k}$	Prior probability of privacy class k
$μ_{k}$	Mean vector of privacy class k
$Σ_{k}$	Covariance matrix of privacy class k

Table 2. Main simulation parameters and their symbolic and numerical values.

Parameters	Symbolic	Numerical Value
Computing capability of cloud servers	$f_{c l o u d}^{c a l c}$	4000 MIPS
Computing power of cloud servers	$p_{c l o u d}^{c a l c}$	600 W
Computing capability of vehicle terminals	$f_{u s e r_{m}}^{c a l c}$	80∼220 MIPS
Computing power of vehicle terminals	$p_{u s e r_{m}}^{c a l c}$	60∼120 W
Transmission power of vehicle terminals	$p_{t r a n s}^{u s e r}$	100∼160 W
Computing capability of edge servers	$f_{e c d_{n}}^{c a l c}$	350∼750 MIPS
Cache capacity of edge servers	$s_{e c d_{n}}^{c a c h e}$	3000 MB
Required computing capability of $t a s k_{d, i}$	$f_{t a s k_{d, i}}^{c a l c}$	60∼200 MIPS
Communication bandwidth between users and edge servers	$B_{m, n}$	80 MHz
Delay weight parameter	$δ_{1}$	0.3
Energy-consumption weight parameter	$δ_{2}$	0.3
Privacy-entropy weight parameter	$δ_{3}$	0.4

Table 3. Sensitivity analysis of CTMA-AC weighting parameters.

$δ_{1}$	$δ_{2}$	$δ_{3}$	Latency (ms)	Energy Consumption (J)	Privacy Entropy (K = 1)
0.5	0.5	0	268.47	5042.81	1.74
0.3	0.3	0.4	289.36	5196.42	1.91
0.2	0.2	0.6	304.18	5328.57	1.93
0.1	0.1	0.8	326.94	5481.63	1.97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, Y.; Song, Z.; Zhu, S.; Zhang, Q. Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning. Future Internet 2026, 18, 215. https://doi.org/10.3390/fi18040215

AMA Style

Yu Y, Song Z, Zhu S, Zhang Q. Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning. Future Internet. 2026; 18(4):215. https://doi.org/10.3390/fi18040215

Chicago/Turabian Style

Yu, Youjian, Zhaowei Song, Sifeng Zhu, and Qinghua Zhang. 2026. "Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning" Future Internet 18, no. 4: 215. https://doi.org/10.3390/fi18040215

APA Style

Yu, Y., Song, Z., Zhu, S., & Zhang, Q. (2026). Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning. Future Internet, 18(4), 215. https://doi.org/10.3390/fi18040215

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning

Abstract

1. Introduction

2. Related Work

2.1. Existing Work on Privacy-Aware Task Offloading

2.2. Discussion of the Differences from Existing Studies

3. System Modeling

3.1. Three-Tier Communication Architecture for Cloud-Edge-End Collaboration

3.2. Privacy-Level Classification of Data Fragments

3.3. Communication Model

3.4. Edge Caching Model

3.5. Delay and Energy Mode

3.6. Privacy Data Fragment Protection Model

3.7. Multi-Objective Optimization Problem Mode

4. Deep Reinforcement Learning Algorithm Design

4.1. CTMA-AC Algorithmic

4.2. Algorithm Complexity Analysis

5. Simulation Experiment and Analysis

5.1. Experimental Setup

5.2. Analysis of Experimental Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI