DNO-RL: A Reinforcement-Learning-Based Approach to Dynamic Noise Optimization for Differential Privacy

Wang, Guixin; Liu, Xiangfei; Zheng, Yukun; Zhang, Zeyu; Cai, Zhiming

doi:10.3390/electronics14153122

Open AccessArticle

DNO-RL: A Reinforcement-Learning-Based Approach to Dynamic Noise Optimization for Differential Privacy

by

Guixin Wang

^1,2,

Xiangfei Liu

^1,3

,

Yukun Zheng

¹,

Zeyu Zhang

¹

and

Zhiming Cai

^4,*

¹

Faculty of Data Science, City University of Macau, Macau 999078, China

²

School of Information Engineering, Wenzhou Business College, Wenzhou 325035, China

³

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

⁴

Faculty of Digital Science and Technology, Macau Millennium College, Macau 999078, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(15), 3122; https://doi.org/10.3390/electronics14153122

Submission received: 6 July 2025 / Revised: 31 July 2025 / Accepted: 3 August 2025 / Published: 5 August 2025

Download

Browse Figures

Versions Notes

Abstract

With the globalized deployment of cross-border vehicle location services and the trajectory data, which contain user identity information and geographically sensitive features, the variability in privacy regulations in different jurisdictions can further exacerbate the technical and compliance challenges of data privacy protection. Traditional static differential privacy mechanisms struggle to accommodate spatiotemporal heterogeneity in dynamic scenarios because of the use of a fixed privacy budget parameter, leading to wasted privacy budgets or insufficient protection of sensitive regions. This study proposes a reinforcement-learning-based dynamic noise optimization method (DNO-RL) that dynamically adjusts the Laplacian noise scale by real-time sensing of vehicle density, region sensitivity, and the remaining privacy budget via a deep Q-network (DQN), with the aim of providing context-adaptive differential privacy protection for cross-border vehicle location services. Simulation experiments of cross-border scenarios based on the T-Drive dataset showed that DNO-RL reduced the average localization error by 28.3% and saved 17.9% of the privacy budget compared with the local differential privacy under the same privacy budget. This study provides a new paradigm for the dynamic privacy–utility balancing of cross-border vehicular networking services.

Keywords:

cross-border vehicle location; differential privacy; reinforcement learning; privacy–utility balance

1. Introduction

With the rapid development of intelligent transportation systems (ITS), cross-border vehicle location services have played a key role in regional economic synergy and border security management [1]. Taking the Guangdong–Macao Greater Bay Area as an example, the average daily cross-border vehicle traffic in the region exceeds 15,000 vehicles, covering a variety of scenarios such as logistics tracking, commuting navigation, and emergency rescue [2]. However, these services face serious privacy concerns. Vehicle trajectory data often contain highly sensitive information, such as driver identity, cargo details, and border checkpoints, which must comply with the requirements of cross-border heterogeneous privacy regulations. These are often conflicting, such as the General Data Protection Regulation (GDPR) in the EU and the Consumer Privacy Act (CCPA) in California [3], which has led to the development of existing privacy protection mechanisms to balance data utility and regulatory compliance. The existing studies on differential privacy in telematics mainly rely on static noise injection mechanisms, which make it difficult to cope with spatiotemporal heterogeneity in dynamic scenarios, such as the inability to dynamically adjust the privacy protection intensity according to vehicle density, road network changes, and regional sensitivity differences, resulting in insufficient capacity for privacy–utility balancing in cross-border scenarios [4,5]. For example, Song et al. [6] proposed incorporating a differential privacy protection method in stochastic gradient descent (SGD), revealing the key roles of batch size and learning rate in the privacy–utility trade-off. Arif et al. [7] proposed an anonymization method that fuses differential privacy and generalization techniques for sensitive information protection of vehicle trajectory data, and Yang et al. [8] designed an optimization based on the k-anonymization dynamic algorithm to support personalized privacy–quality of service co-optimization under multiple attack scenarios in urban transportation. In cross-border data-sharing scenarios, the conflict between heterogeneous privacy regulations across jurisdictions and dynamic environmental parameters (e.g., vehicle density and regional sensitivity) is significantly exacerbated, making it difficult for traditional static differential privacy mechanisms to achieve an efficient privacy–utility balance.

Building on the identified gaps in existing privacy protection mechanisms, our study specifically addresses three critical deficiencies in the context of cross-border scenarios: (1) Insufficient spatio-temporal dynamic adaptability. Existing static privacy protection strategies cannot flexibly respond to real-time fluctuations in vehicle density, which leads to lagging and instability in privacy protection effects [9]. (2) The lack of cross-border regulatory compliance and the difficulty of existing mechanisms to effectively harmonize conflicting requirements between different legal systems, such as the significant difference between the “express consent” paradigm in the GDPR and the “implied consent” paradigm in the CCPA [10]. (3) There are limitations in road network risk modeling, and evenly distributing the privacy budget may lead to over-consumption of privacy protection resources in low-risk areas, thus making it difficult to adequately secure sensitive data in high-risk areas (e.g., border checkpoints, etc.) [11,12]. To address these challenges, this study proposes a Dynamic Noise Optimization via Reinforcement Learning (DNO-RL) framework, which adaptively controls privacy parameters under multi-dimensional constraints through a context-aware decision-making mechanism. To address the problem of insufficient privacy–utility trade-off capability in cross-border scenarios, this study innovatively proposes a dynamic mapping mechanism between environmental states and privacy budget parameters. The DNO-RL framework proposed in this study is a differential privacy dynamic noise optimization framework based on reinforcement learning that aims to solve the privacy–utility trade-off problem in cross-border vehicle localization. The framework can adaptively adjust the privacy protection strength according to changes in the environmental state (e.g., an increase in vehicle density or entry into sensitive areas) to enhance privacy protection while satisfying the constraints of multi-jurisdictional privacy regulations (e.g., GDPR and CCPA). Compared to existing studies, the innovative contributions of this study are reflected in the following aspects:

(1): Context-aware noise dynamic regulation mechanism: Through reinforcement learning, the intelligent body senses multi-dimensional states, such as vehicle density, positioning accuracy demand, and remaining privacy budget, in real time and dynamically optimizes the noise parameters (e.g., Laplacian noise scale) to realize the precise regulation of noise and adapt to the privacy protection demand under different environmental conditions.
(2): Compliance action space modeling under multi-regulation constraints: An action space model based on regional privacy regulation constraints is designed to ensure that the noise policy can simultaneously comply with the compliance requirements of heterogeneous privacy regulations, such as GDPR and CCPA, while meeting privacy protection needs.
(3): Lightweight online decision-making architecture: This combines a deep Q-network (DQN) with privacy budget constraints to realize low-latency noise-allocation policies for highly dynamic and real-time changing vehicular environments. This approach fills the research gap in dynamic differential privacy optimization for cross-border vehicle services and provides a scalable privacy-preserving solution for cross-border telematics applications that can adapt to the privacy regulatory requirements of different jurisdictions.

The remainder of this paper is organized as follows: Section 2 reviews related research and introduces in detail the research progress of differential privacy in telematics and the application of reinforcement learning in privacy optimization. Section 3 describes the basic concepts and definitions of differential privacy and reinforcement learning and focuses on the privacy protection mechanism and the role of intelligent agents in decision-making. Section 4 introduces the design of the DNO-RL framework, including the problem definition, algorithm framework, and key technologies. Section 5 simulates cross-border scenarios based on the T-Drive dataset, discusses the experimental results, and verifies the effectiveness and practicality of the proposed method. Section 6 summarizes the study and suggests future research directions.

2. Related Work

2.1. Differential Privacy and Its Extensions

Differential privacy (DP) is a mathematical framework [13] that ensures that the sensitivity to individual data is minimized by adding noise to the results of the data analysis. Classical DP theories include the Laplace mechanism (by adding noise that obeys the Laplace distribution) and exponential mechanism (sampling based on a utility function), which are widely used in aggregated queries and location data protection [14,15]. Subsequent research has extended the application of DP with localized differential privacy (traditional LDP), allowing users to perturb data locally before uploading them, for example, Google’s RAPPOR scheme [16]. However, this approach introduces higher noise, which leads to the degradation of data utility. To optimize the privacy budget allocation, Xu et al. [17] proposed an adaptive

ε

-allocation strategy that dynamically adjusts the budget based on the distance between each query location and the nearest sensitive location on the selected route to enhance the efficiency of privacy protection. However, this method is limited to offline scenarios and is difficult to apply directly to real-time vehicle services. Dwork et al. [18] first proposed a method for privacy preservation in private data analysis by calibrating the noise to match the sensitivity of the data and ensure that the individual data are not invertibly extrapolated. However, the applicability of this method in dynamic scenarios remains limited. Recently, dynamic privacy budget allocation methods have been proposed to optimize long-term privacy protection. These methods dynamically adjust the budget according to the sensitivity of the query or the importance of the location to improve the efficiency of privacy protection [19].Wang et al. [20] proposed a privacy budget allocation strategy based on regional privacy weights, but the strategy is not sufficiently effective in border regions due to the high computational complexity. In the field of vehicle positioning, Wu et al. [21] applied DP to protect the privacy of real-time traffic flow data and optimized the allocation of the privacy budget through dynamic thresholding. However, the adaptability of the method in scenarios with dynamically changing spatial sensitivity is still insufficient. Ghane et al. [22] attempted to optimize the protection of location data through an adaptive privacy budget, but the method lacks the modeling of context dynamics, which limits its application in cross-border scenarios.

2.2. State-of-the-Art Research on Privacy Protection in Cross-Border Scenarios

In cross-border vehicle localization scenarios, anonymization techniques for vehicle trajectories mainly include methods such as k-anonymization and geo-masking. For example, Xu et al. [23] proposed a k-anonymization method for generating virtual trajectories to mask real trajectories, which reduces the risk of individual identification by generalizing or blurring vehicle trajectories. However, this method relies on strong assumptions (e.g., uniform vehicle distribution) and is susceptible to background knowledge attacks in sparse traffic scenarios. Chen et al. [24] combine differential privacy and generalized anonymization methods to protect the privacy of sensitive vehicle trajectories. However, this method uses a fixed noise scale and cannot adapt to the dynamic nature of cross-border road networks. Min et al. [25] proposed a semantically adaptive geographically indistinguishable mechanism for quantifying personalized location privacy and balancing privacy protection and quality of service. However, this approach does not fully consider the problem of multiple regulatory constraints in cross-border scenarios, leading to significant challenges in data sharing and privacy protection across jurisdictions. In addition, sensitive areas, such as border checkpoints, impose stringent requirements on privacy protection, but these critical areas are still underexplored in existing research.

2.3. Application of Reinforcement Learning to Privacy Budget Optimization

Reinforcement learning (RL) is a machine-learning method for modeling and solving dynamic decision-making problems [26]. Sutton and Barto [27] proposed a Markov Decision Process (MDP) framework for RL that provides a theoretical foundation for modeling dynamic decision-making problems. However, DRL still has deficiencies in training stability, sample efficiency, and interpretability, and further research is needed to improve it. Li et al. [28] applied DRL to dynamic DP budget allocation in smart grids by adjusting the noise parameter through real-time load. In the field of telematics, Chen et al. [29] proposed an RL-based differential privacy mechanism that optimizes the obfuscation strategy for RL selection in vehicular ad hoc networks (VANETs) [30] and dynamically adjusts privacy parameters to enhance the privacy–utility performance. Erdemir et al. [31] applied RL to location sharing and designed a reward function to balance privacy and quality of service. Although there have been studies validating the effectiveness of RL in privacy–utility trade-offs, the existing methods still lack a design for cross-border scenarios, especially the compatibility of multi-objective reward functions with cross-domain policies, and current research exhibits a number of constraints and shortcomings [32].

(1) Inefficiency of the static noise mechanism. Traditional methods rely on fixed noise parameters and cannot adapt to spatial and temporal variations in vehicle density and road network dynamics. This static mechanism may waste the privacy budget owing to overprotection in low-risk regions, while it may result in privacy leakage owing to insufficient privacy safeguards in high-risk regions. To address this problem, this study introduces a reinforcement learning framework to improve privacy protection efficiency while optimizing data utility by constructing a state space (including vehicle density, privacy budget, and regulatory constraints) to sense environmental changes in real time and adaptively adjust differential privacy noise parameters.

(2) Lack of cross-domain privacy policy synergy. Current approaches are mostly designed for a single jurisdiction and fail to effectively address the heterogeneous conflicts between regulations, such as the GDPR and CCPA. In cross-border scenarios, vehicles frequently traverse different regulatory regions, making it difficult for existing approaches to ensure compliance. This study proposes a multi-regulation-compatible action space design that achieves the collaborative execution of heterogeneous privacy policies through constraint optimization techniques to guarantee regulatory consistency and the applicability of privacy protection in cross-border scenarios.

(3) Complexity of dynamic environment modeling. The spatiotemporal heterogeneity of vehicle movement patterns and regulatory constraints in cross-border scenarios increases the difficulty of online decision-making. This study combined DQN with privacy budget constraints to propose a lightweight online decision-making architecture with a small single inference computation to meet the needs of dynamic environments.

To address these shortcomings, this study proposes a DNO-RL framework that aims to achieve adaptive privacy protection in cross-border vehicle localization scenarios. The framework dynamically adjusts the noise parameters of differential privacy by sensing the environmental state to achieve an optimal balance between privacy protection and data availability while meeting multiple regulatory compliance requirements.

3. Related Definitions

3.1. Differential Privacy

Differential privacy provides quantifiable privacy protection in data analysis and machine-learning tasks by introducing controlled random noise [33,34]. The core idea is to effectively reduce the risk of sensitive data leakage by limiting the impact of the addition or removal of individual data records on the distribution of algorithmic outputs and by reducing the ability of attackers to infer individual information [35]. Differential privacy measures the strength of privacy protection by defining a privacy budget, where smaller values provide stronger privacy protection but may reduce data utility.

Definition 1

(

ε

-Differential Privacy [36]). There is a randomized algorithm M. Its input is dataset D, the output is

M (D)

. If for any neighboring dataset D and

D^{'}

(i.e., only one record difference), as shown in Equation (1), if the algorithm satisfies the following conditions:

Pr [M (D) \in S] \leq Pr [M (D^{'}) \in S] \cdot e^{E} + δ

(1)

Then M is said to satisfy

(ε, δ)

-differential privacy. Here, S is the output space of the algorithm M,

ε \in R^{+}

represents the privacy budget, which controls the strength of privacy protection. The smaller the value, the stronger the privacy protection.

δ \in [0, 1)

denotes the probability of failure of the differential privacy mechanism. When

δ = 0

, the algorithm satisfies pure ε-differential privacy.

Definition 2

(Global Sensitivity [37]). For query functions

f : D \to R^{d}

, the sensitivity

Δ f

is shown in Equation (2):

Δ f = max_{D \sim D^{'}} {∥ f (D) - f (D^{'}) ∥}_{1}

(2)

where

D \sim D^{'}

indicates neighboring datasets and

{∥ \cdot ∥}_{1}

represents the

L_{1}

norm. Sensitivity is used to quantify the extent to which individual data records affect query results and to determine the appropriate magnitude of noise.

Definition 3

(Laplace Mechanism [38]). The Laplace mechanism is a classical approach to achieve pure ε-differential privacy and is widely used for privacy preservation in scenarios such as reinforcement learning. Given a randomized algorithm M with sensitivity

Δ f

for the dataset D, its output result is shown in Equation (3):

M (D) = f (D) + Lap (\frac{Δ f}{ε})

(3)

where

Lap (\frac{Δ f}{ε})

denotes Laplace-distributed noise with mean 0 and scale parameter

\frac{Δ f}{ε}

.

Definition 4

(Sequential Combinatoriality [36]). Let D and

D^{'}

be any neighboring datasets (i.e., differing by only one record) and S be the output space of the algorithm M. For k sequentially executed differential privacy mechanisms

M_{1}, M_{2}, \dots, M_{k}

, where each

M_{i}

satisfies

ε_{i}

-differential privacy, their sequential composition M satisfies the following conditions, as shown in Equation (4):

Pr [M (D) \in S] \leq Pr [M (D^{'}) \in S] \cdot e^{\sum_{i = 1}^{k} ε_{i}}

(4)

Then, the overall level of privacy protection is determined by the sum of all the

ε_{i}

.

3.2. Reinforcement Learning

Reinforcement learning is a machine-learning paradigm based on the interaction between an intelligent body and its environment, with the goal of maximizing cumulative rewards and optimizing decision-making strategies, mainly through trial-and-error exploration [39]. The core idea is to use cumulative reward signals to guide the intelligent body to learn optimal behavioral strategies in uncertain environments to maximize long-term gains [40,41]. Reinforcement learning is usually modeled based on a Markov decision process in which the intelligent body performs actions in different states and receives rewards based on environmental feedback, which in turn continuously updates the strategy. The workflow is illustrated in Figure 1, taking vehicle privacy protection in cross-border scenarios as an example.

Definition 5

(Markov Decision Process [42]). The MDP is defined by a tuple

(S, A, P, R, γ)

, where S represents the state space, defined as the set of all possible states the environment can be in; A represents the action space, which is the set of actions the agent can choose from in each state; P is the state transition probability function, describing the probability distribution of transitioning to the next state after taking a specific action in a given state; R is the reward function, which evaluates the immediate feedback generated by each state transition; and

γ \in [0, 1]

is the discount factor, which is used to measure the importance of future rewards.

Definition 6.

The goal of reinforcement learning is to find an optimal policy

π^{*}

that maximizes the expected cumulative reward for the agent, starting from any initial state. The cumulative return

G_{t}

at time step t is defined by Equation (5):

G_{t} = R_{t + 1} + γ R_{t + 2} + γ^{2} R_{t + 3} + \dots = \sum_{k = 0}^{\infty} γ^{k} R_{t + k + 1}

(5)

where

R_{t + k + 1}

represents the reward obtained after executing action

a_{t}

in state

s_{t}

and transitioning to state

s_{t + 1}

and γ determines the decay rate of future rewards.

Definition 7.

Q-learning is a value-iteration-based policy optimization method that combines the processes of policy evaluation and improvement. Assuming that we can accurately estimate the value function

Q^{π} (s, a)

for any state–action pair under a given policy π by traversing all possible policies, the optimal value function is defined as the maximum achievable value, given by Equation (6):

Q^{*} (s, a) = max_{π} Q^{π} (s, a)

(6)

The optimal policy

π^{*}

is defined by Equation (7):

π^{*} = arg max_{a} Q^{*} (s, a)

(7)

4. Methodology

4.1. Problem Definition and Mathematical Modeling

Cross-jurisdictional vehicle location data usually contain sensitive information, and there is a conflict between data privacy protection and location service quality [43]. On the one hand, to protect user privacy, noise needs to be introduced in the data dissemination process to hide the real location. However, a noise scale that is too large may significantly reduce service usability. To achieve effective privacy protection for vehicle location data, this study introduces a differential privacy mechanism and proposes a dynamic noise optimization problem. Specifically, the core task of this study is to dynamically adjust the noise scale to maximize data utility while satisfying differential privacy constraints in multiple jurisdictions. This study models the differential privacy mechanism, proposes a dynamic noise-optimization method, and formalizes the noise regulation problem as a Markov decision process. The optimization objective is to minimize the average utility loss of the data after perturbation while satisfying privacy-preserving constraints.

(1) Design of Cross-border Scenarios and Sensitive Areas

In cross-jurisdictional vehicle location services, let the vehicle trajectory data be

D = {x_{1}, x_{2}, \dots, x_{n}}

, where

x_{i} = (t_{i}, {lat}_{i}, {lon}_{i})

represents the latitude and longitude coordinates of the i-th vehicle at time

t_{i}

. This study aims to minimize the average utility loss of perturbed location data through the dynamic noise optimization while satisfying differential privacy constraints. This study classifies geographical areas into two types: sensitive areas and ordinary areas. Sensitive areas refer to regions subject to stringent privacy protection regulations (e.g., border checkpoints and adjacent zones), as formally defined in Equation (8):

S_{s e n s i t i v e} = {\begin{matrix} (l a t, l o n) ∣ d_{H} ((l a t, l o n), C_{i}) \leq R_{i}, & i \in (1, 2, \dots, n) \end{matrix}}

(8)

where

C_{i} = ({lat}_{i}, {lon}_{i})

represents the geographic coordinates of the i-th border checkpoint,

R_{i}

denotes the influence radius of checkpoint i (default is 500 m), and

d_{H}

refers to the haversine distance function used to calculate the spherical distance between two locations.

(2) Differential Privacy Protection Mechanism

This study employs the Laplace mechanism to achieve

ϵ

-differential privacy protection and satisfy privacy protection constraints. For any pair of neighboring trajectory datasets D and

D^{'}

(i.e., differing by only one record), the

ϵ

-differential privacy condition must be satisfied, as shown in Equation (9):

P_{r} [M (D) \in S] \leq e^{ϵ} \cdot P_{r} [M (D^{'}) \in S]

(9)

where M represents the differential privacy perturbation mechanism and S is its corresponding output space.

Based on Formula (9), the privacy-preserving output can be expressed as

\tilde{x} = x + Lap (Δ f / ε),

where

\tilde{x}

represents the perturbed coordinates after adding noise,

Lap (Δ f / ε)

is a random variable from the Laplace distribution with scale parameter

Δ f / ε

,

Δ f

denotes the global sensitivity, and

ε

is the privacy budget parameter that controls the strength of privacy protection. According to regulatory requirements in cross-border scenarios, this algorithm adopts differentiated parameter configurations for different regions: sensitive areas comply with the GDPR standard, with

ε = 0.1

and

Δ f = 0.001

set to provide strong privacy protection, general areas comply with the CCPA standard, with

ε = 0.5

and

Δ f = 0.01

set to balance privacy protection and data utility.

(3) Privacy Budget Allocation Strategy

To achieve effective long-term privacy protection, this study proposes a dynamic budget allocation strategy based on exponential decay, as shown in Equation (10):

B_{t} = min (B_{remaining} \times (1 - β^{t}) \times α, B_{remaining})

(10)

where

B_{t}

represents the privacy budget allocated at time step t,

B_{remaining}

is the current remaining total budget,

β = 0.98

is the budget decay rate that controls the allocation speed, and

α = 0.1

is the allocation ratio factor. This strategy adopts a "conservative in the early stage, accelerated in the later stage" approach, avoiding excessive budget consumption at the beginning and ensuring continuous privacy protection throughout the entire trajectory-transmission process.

(4) Markov Decision Process Modeling

This study formalizes the dynamic noise optimization problem as a Markov decision process (MDP), defined by the tuple

(S, A, P, R, γ)

, where the following are the key parameter definitions and descriptions of the MDP in the DNO-RL algorithm.

State Space (S): The state space consists of six key features, specifically defined as follows:

S = {ρ_{t}, ν_{t}, τ_{t}, I_{t}, ε_{t}, μ_{t}},

The specific meanings of each state component are shown in Table 1.

Action Space (

A

): The action space

A

controls privacy adjustment, sensitivity adjustment, and noise scale, and is specifically defined as

A = {Δ ε, Δ s, λ}

, where the privacy budget adjustment

Δ ε \in [- 0.1, 0.1]

, sensitivity adjustment

Δ s \in [- 0.0001, 0.0001]

, and noise scale

λ_{t} \in [0.5, + \infty)

are used to determine the dynamic regulation behavior of the privacy protection strategy.

State Transition Function (P): The state transition function

P (s_{t + 1} ∣ s_{t}, a_{t})

is used to describe the evolution of the system’s state after taking an action

a_{t}

in the current state

s_{t}

, and is specifically defined as

P (s_{t + 1} ∣ s_{t}, a_{t}) = (ρ_{t + 1}, ν_{t + 1}, τ_{t + 1}, I_{t + 1}, ε_{t + 1}, μ_{t + 1})

. Here, the privacy level and utility score are updated through

ε_{t + 1} = ε_{t} + Δ ε

and

u_{t + 1} = 1 - \frac{1}{1 + e^{- ε_{t + 1}}}

, respectively. This mechanism reflects the system’s dynamic adjustment of the balance between privacy strength and data utility.

Reward Function Design (R): Reward function design adaptively balances privacy protection and data utility, making it suitable for dynamic noise optimization in cross-border vehicle trajectory scenarios. The region-sensitive reward function design is shown in Equation (11):

R (s_{t}, a_{t}) = - (ω_{privacy} (I_{t}) \cdot L_{privacy} + ω_{utility} (I_{t}) \cdot L_{utility})

(11)

The weight-allocation strategy is as follows:

ω_{privacy} (I_{t}) = \{\begin{matrix} 0.8, & if I_{t} = 1 (Sensitive Areas, GDPR - compliant), \\ 0.4, & if I_{t} = 0 (General Areas, CCPA - compliant), \end{matrix}

ω_{utility} (I_{t}) = 1 - ω_{privacy} (I_{t})

Privacy loss is modeled using the Sigmoid function, reflecting the nonlinear relationship between the privacy parameter and the risk of privacy leakage, as shown below:

L_{privacy} (ε_{t}) = σ (ε_{t}) = \frac{1}{1 + e^{- ε_{t}}}

L_{utility} (ε_{t}) = 1 - σ (ε_{t}) = 1 - \frac{1}{1 + e^{- ε_{t}}}

This strategy dynamically adjusts the weights based on whether the area is sensitive or general. In sensitive areas governed by the GDPR, privacy protection is prioritized, whereas in general areas under the jurisdiction of the CCPA, utility is emphasized.

Additionally, the discount factor

γ = 0.99

is set to regulate the trade-off between immediate privacy protection and long-term utility optimization. Based on the above mechanism, the DNO-RL algorithm aims to learn the optimal policy

π^{*} (s)

to maximize the cumulative discounted reward:

π^{*} (s) = arg max_{π} E [\sum_{t = 0}^{\infty} γ^{t} R (s_{t}, π (s_{t}))]

(12)

The definitions and meanings of the main symbols used in this study are presented in Table 2.

4.2. Dynamic Noise Optimization Algorithm Framework

Based on the above theory, this study proposes a DNO-RL framework based on deep reinforcement learning. The framework consists of a vehicle-side decision-making layer and a server-side service layer.

The vehicle-side decision layer mainly contains the following core modules.

(1) Environment Sensing Module: responsible for sensing environmental state information, such as vehicle density, area sensitivity, remaining privacy budget, and utility score.

(2) Deep Q-Network Decision Module: optimizes decisions through the main network and target network.

(3) Action Selection Module: generates and sends position information after perturbation.

(4) Differential Privacy Noise Generation Module: dynamically adjusts the noise parameters.

(5) Privacy Budget Allocation Module: implements dynamic privacy budget allocation.

The server-side service layer contains three main modules.

(1) Privacy Compliance Monitoring Module: responsible for monitoring the identification of sensitive areas and budget utilization, ensuring that location data comply with cross-jurisdictional privacy regulations.

(2) Service Quality Feedback Module: this module assesses the quality of location services in real time, generating utility feedback signals to provide performance indicators for training strategies in the vehicle-side reinforcement learning module.

(3) Cross-Border Location Services Module: this module handles location data reception, service provision, and cross-regional data transmission. The overall structure of the proposed framework is illustrated in Figure 2.

4.3. DNO-RL Algorithm Flow

The DNO-RL algorithm is based on a deep reinforcement learning framework to achieve the dynamic optimization of differential privacy parameters. Its core structure includes the state space (vehicle density, average speed, area sensitivity, current privacy level, utility score, etc.), action space (noise scale and privacy sensitivity adjustment amount), and a dynamic reward function based on scene perception. To improve training efficiency and stability, the algorithm introduces a prioritized experience playback mechanism that dynamically adjusts the sampling probability according to the temporal difference (TD) error. Simultaneously, a periodic target network update and parameter freezing strategy were adopted to reduce strategy volatility during the training process. The core execution process of this algorithm includes the following steps:

(1) The environment sensing module acquires the environmental state in real-time and constructs the state space.

(2) The deep Q-network decision module outputs the optimal action according to the current state and is used to adjust the noise scale and privacy parameters.

(3) Perturbation data that satisfy differential privacy through the Laplace mechanism are generated according to the selected action.

(4) Upload the perturbation data to the server side and verify their compliance with multiple jurisdictions (e.g., GDPR and CCPA) using the privacy compliance module.

(5) The system calculates the reward value based on the privacy protection strength and positioning accuracy to measure the privacy-utility balance.

(6) The service quality feedback module calculates the utility loss and uses the feedback signal to update the vehicle-side reward function.

(7) The main network is trained using a prioritized experience playback mechanism combined with periodic target network updates to enhance the stability and convergence of the strategy training.

The pseudo-code of the DNO-RL algorithm is presented in Algorithm 1.

Algorithm 1: Dynamic noise optimization reinforcement learning (DNO-RL)

5. Experiments

5.1. Experimental Environment

The experimental platform of this research was based on the Windows operating system, the hardware configuration was 16GB RAM and an Intel Xeon Platinum 8255C processor (Intel Corporation, Santa Clara, CA, USA ), and the software environment was Python 3.8. To address the challenges of cross-border vehicle location privacy protection, this study proposes a DNO-RL framework. The experiments included performance evaluation, noise perturbation verification, and a comparative analysis with existing algorithms, which are structured as follows.

5.1.1. Performance Evaluation Experiment

To comprehensively assess the overall performance of the DNO-RL framework, this study constructed a multi-dimensional evaluation system containing three core indicators: privacy protection strength, location service quality, and algorithm adaptation capability. The experimental scenario simulates the typical characteristics of the cross-border region, integrating real traffic flow data with multi-jurisdictional privacy regulation (e.g., GDPR and CCPA) constraints, and strives to maximize closeness to the actual application scenarios. The experiment aimed to verify the robustness of the DNO-RL in complex dynamic environments and its potential for real-world deployment.

5.1.2. Noise Validation Experiment

To investigate the dynamic performance of disturbance noise under different conditions, this experimental design focused on analyzing the effects of vehicle density and area type on noise scale. The experimental setup covered a range of vehicle densities from 10% to 100%, and the areas were categorized as normal and sensitive. By measuring and analyzing the noise scales under these diverse scenarios, the experiment aims to reveal how vehicle density and area sensitivity jointly affect disturbance intensity, thus providing theoretical support for the construction of adaptive noise management strategies in the future.

5.1.3. Algorithm Comparison Experiments

To systematically evaluate the optimization capability and actual performance advantages of the DNO-RL, two sets of comparison experiments were designed in this study to verify the performance from different perspectives.

(1) Comparison with Fixed Noise Mechanism This experiment aimed to compare the performance of DNO-RL with that of the traditional fixed noise mechanism under different privacy budget levels. Specifically, the experimental setup covered three privacy budget levels—low, medium, and high—to verify the adaptability of the DNO-RL dynamic budget allocation strategy in complex environments. By focusing on the analysis of its performance in terms of privacy protection and data utility balance, we aim to reveal its significant optimization capability in dynamic scenarios.

(2) Comparative Analysis with DQN and Double Deep Q-Network (DDQN)

To further evaluate the advantages of DNO-RL in reinforcement learning optimization, an experiment comparing DNO-RL, DQN, and the DDQN method was conducted, focusing on the following three aspects: (a) the difference in policy performance in sensitive and non-sensitive regions, (b) the convergence speed during the training process, and (c) the stability of the learning process. The experiments were evaluated using quantitative metrics and learning curves, which verified the comprehensive advantages of DNO-RL in terms of convergence efficiency enhancement, training stability enhancement, and privacy–utility trade-off optimization.

5.2. Introduction of the Dataset

This study used the publicly available T-Drive dataset, which records the trajectories of approximately 10,000 Beijing cabs, totaling approximately 15 million GPS records. The dataset contains the latitude and longitude coordinates and timestamp and speed information of each cab. By preprocessing the dataset, a virtual cross-border mobility scenario was constructed to support the study of multiregional privacy policies. To simulate a cross-border scenario, the dataset was divided into multiple regions to form an experimental environment with heterogeneous privacy constraints. The specific divisions were as follows:

(1) Sensitive Region

Definition: This region simulates the border checkpoint environment, taking the center of Beijing as a reference, randomly selecting several locations as simulated cross-border ports, and defining a circular area with a radius of 500 m around it as a sensitive region.

Privacy Requirements: The area must meet strict privacy protection requirements and comply with the General Data Protection Regulation (GDPR) standards.

Purpose: It is used to simulate cross-border checkpoint scenarios with extremely high privacy protection requirements.

(2) Ordinary area

Definition: Ordinary areas include city roads and suburbs, covering all areas of Beijing, except for sensitive areas.

Privacy Requirements: This region follows the California Consumer Privacy Act (CCPA) standards and has relatively loose privacy protection requirements.

Purpose: It is used to simulate areas with low privacy protection requirements in daily traffic environments.

(3) Data pre-processing

The following preprocessing operations were performed on the data to fulfill the experimental requirements.

Sampling. To ensure spatiotemporal continuity, the original trajectory data were downsampled at 5 min intervals.

Filtering. Trajectory points with a speed below 5 km/h, typically caused by parking or traffic congestion, were removed to eliminate redundant data.

5.3. Comparison Methods and Evaluation Metrics

5.3.1. Baseline Methods

(1) Static Laplace Mechanism (fixed

ε

): A fixed privacy budget is applied across all regions and times. The location sensitivity was set to 0.01 (approximately 1.11 km), and the noise scale parameter was defaulted to 0.01 (corresponding to

ε = 1

).

(2) Local Differential Privacy (traditional LDP): Each vehicle independently adds noise locally without considering the differences between regions. In the cross-border privacy protection model, the implementation of local differential privacy is based on an adaptive noise mechanism that adjusts the noise intensity according to the vehicle’s motion state (real-time adjustment based on speed), perturbing trajectory points in real time.

(3) Rule-based Dynamic DP (rule-based DP): The noise intensity is dynamically adjusted based on predefined rules (e.g., region type). The adjustment of noise intensity was based on vehicle density, achieving linear variation through a hyperbolic tangent function (density threshold set to 100 vehicles/km²), and a minimum noise protection threshold was set (0.02, corresponding to a 200 m location error).

5.3.2. Evaluation Metrics

Privacy Protection Level: Measured by the cumulative privacy budget

\sum_{t = 1}^{T} ϵ_{t}

, where a smaller value indicates more efficient privacy consumption.

Location Error: Measured by the mean haversine distance (in meters) between the original and perturbed locations, reflecting changes in positioning accuracy.

Communication Overhead: Refers to the frequency of data transmission per unit of time, where a higher frequency indicates greater communication costs.

5.3.3. Experimental Parameterization

This study constructed an experimental environment with heterogeneous privacy constraints in a cross-border vehicle monitoring scenario and introduced a dynamic noise-optimization mechanism based on deep reinforcement learning. The core parameter configurations of the experiment are presented in Table 3.

5.4. Analysis of Experimental Results

5.4.1. Analysis of Privacy–Utility Trade-Offs Under Different Privacy Budgets

This experiment compared the performance of the S-DP (fixed

ε

) method and DNO-RL scheme under different privacy budgets (

ε \in [0.25, 2.0]

), with the results shown in Figure 3.

The experimental results show that within the privacy budget range of

ε \in [0.25, 2.0]

, the DNO-RL strategy not only consistently outperforms the traditional S-DP method in terms of positioning error, but also significantly reduces utility loss by an average of 20% (from 5 m to 4 m). In particular, in low-privacy-budget scenarios (

ε = 0.25

), the utility loss of DNO-RL is reduced by 25% compared to S-DP and the positioning accuracy is significantly improved over the baseline method, with an error reduction of approximately 6.7%. This further validates that the method achieves a better trade-off between privacy protection and data utility.

5.4.2. Noise Scale Analysis for Different Areas and Densities

This experiment analyzed the effects of vehicle density (ranging from 10% to 100%) and type of area (common area vs. sensitive area) on the noise scale

λ_{t}

; the results are shown in Figure 4.

According to the experimental results, the DNO-RL algorithm achieved fine-grained privacy protection in cross-border scenarios by adaptively adjusting the noise scale. In the highly sensitive area, the noise intensity was enhanced by 60% on average compared with the non-sensitive area (

λ = 0.50

vs.

λ = 0.44

,

p < 0.01

), which was significantly higher than that in the non-sensitive area. In terms of vehicle density, from low-density (10%) to high-density (100%) scenarios, the noise scale shows a significant linear growth trend, with an average enhancement of 0.08–0.09 for every 22.5% increase in vehicle density, which not only ensures strong privacy protection in sensitive areas, but also maintains high data utility in non-sensitive areas, which fully validates the effectiveness and robustness of the algorithm in complex cross-border scenarios.

5.4.3. Performance Comparison of Different Algorithms in Cross-Border Scenarios

The performance comparison of different methods (DNO-RL, S-DP, traditional LDP, and rule-based DP) in cross-border scenarios was analyzed. The performance of DNO-RL with the baseline methods in terms of the localization error and privacy protection level is presented in Table 4.

The experimental results demonstrate that the DNO-RL method significantly optimizes the privacy–utility trade-off in cross-border scenarios, achieving the lowest cumulative privacy budget and communication overhead with an average localization error of 45.67 m, which is slightly higher than the S-DP method but 28.3% and 60% lower than the traditional LDP and rule-dynamic methods, respectively. Within the privacy budget range of 0.25 to 2.0, DNO-RL outperforms both traditional LDP and rule-based methods, especially in the low budget interval (<0.5), where its localization error is about 50% and 25% lower than LDP and S-DP, respectively, with performance differences converging as the budget increases. This confirms DNO-RL’s effectiveness in resource-constrained scenarios, offering better privacy–utility balancing and dynamic noise tuning. Performance advantages include (1) avoiding excessive noise via global coordination compared to LDP’s node-independent errors; (2) learning complex environmental patterns through reinforcement learning, unlike the preset threshold adjustments of rule-based methods; and (3) improving privacy budget utilization by 18% over fixed

ε

methods, despite slightly higher localization errors, achieving a valuable long-term balance in finite-resource settings.

5.4.4. Comparison Experiments with Different Algorithms

(1) Comparison with fixed noise mechanism

To better verify the robustness of the system, the DNO-RL scheme was compared with the pass S-DP method under different privacy budget levels (high, medium, and low) to verify whether the DNO-RL scheme can maintain a stable privacy protection effect under different budget constraints, as shown in Figure 5.

The experimental results show that under different privacy levels (low, medium, and high), the DNO-RL dynamic allocation scheme exhibits advantages over the fixed budget allocation method in both the privacy protection score and utility score dimensions, particularly in high-privacy-requirement scenarios. The privacy protection and utility scores of the DNO-RL scheme are higher than those of the fixed method by 2.5% and 26.9%, respectively, indicating that the privacy–utility trade-offs of the proposed scheme are significant.

(2) Comparative analysis with DQN and DDQN

This study evaluated the performance of the improved DDQN algorithm in different learning environments by comparing it with the traditional DQN and DDQN methods, as shown in Figure 6, Figure 7 and Figure 8.

Based on the above experimental results, the proposed algorithmic framework was used to solve the privacy-preservation problem in cross-border scenarios. As shown in the figure, in the sensitive region, the privacy loss of the DNO-RL algorithm is only 0.2, which is significantly lower than that of the DDQN (0.25) and DQN (0.3) algorithms, while it maintains a high utility score. In the non-sensitive region, the privacy loss of the three algorithms is generally lower, but DNO-RL still achieves the best privacy–utility balance. The convergence speed comparison shows that the DNO-RL algorithm converges in only 90 rounds, which is 24.4% and 43.0% faster than the DDQN (119 rounds) and DQN (158 rounds), respectively. As can be seen from the learning curves, not only does it converge faster, but the final average reward is significantly higher than that of the other two algorithms (approximately 100 compared to approximately 70 for DDQN and 45 for DQN). These results demonstrate the effectiveness of the prioritized empirical playback and dynamic noise adjustment strategies in improving algorithm performance and balancing the privacy–utility trade-off.

6. Conclusions

The DNO-RL framework proposed in this study achieves a dynamic balance between privacy protection and data utility in cross-border vehicle localization scenarios. Experimental results show that DNO-RL reduces the average communication frequency by 33% compared to the traditional fixation method under the same privacy budget constraints, while maintaining comparable localization accuracy. In sensitive areas, the DNO-RL showed good scene adaptation: the noise scale was approximately 13.6% higher on average than that in non-sensitive areas and showed a reasonable increase with vehicle density. The results show that DNO-RL can dynamically adjust the privacy protection intensity according to the environmental state, which provides a more flexible and efficient privacy protection solution for intelligent transportation applications, such as cross-border logistics tracking and commuter-vehicle monitoring. It should be noted that in high-privacy-requirement scenarios, the DNO-RL scheme achieves privacy protection and utility scores that are 2.5% and 26.9% higher than those of the fixed

ε

(S-DP) method, respectively, highlighting the significant advantage of the proposed approach in terms of the privacy–utility trade-off. To enhance the practicality and generalizability of the research, future work will focus on (1) validation in real cross-border scenarios, collaborating with border management authorities to verify algorithm performance; (2) modeling multi-jurisdictional complexity by developing compliance models that account for dynamic legal changes and enforcement variations; and (3) diversity in user behavior by incorporating privacy preference models from different cultural backgrounds to improve algorithm adaptability. Future research will also explore integrating differential privacy with federated learning to achieve privacy protection and secure aggregation of distributed trajectory data [47,48], enhancing privacy safeguards in cross-border scenarios.

Author Contributions

Conceptualization, G.W., X.L. and Z.C.; methodology, G.W. and Y.Z.; validation, X.L. and Y.Z.; formal analysis, G.W. and Y.Z.; investigation, X.L. and Z.Z.; resources, Z.Z. and G.W.; writing—original draft preparation, G.W., X.L. and Y.Z.; writing—review and editing, Z.C. and Z.Z.; supervision, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by National Natural Science Foundation of China and Macau Science and Technology Development Joint Fund (0066/2019/AFJ): Research on knowledge-oriented probabilistic graphical model theory based on multi-source data, in part by MOST-FDCT Projects (0058/2019/AMJ): Research and Application of Cooperative Multi-Agent Platform for Zhuhai-Macao Manufacturing Service and in part by Zhejiang Provincial First-Class Course Construction Project for Undergraduate Universities: Software Engineering (YLKC1914).

Data Availability Statement

Data are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CCPA	California Consumer Privacy Act
DQN	Deep Q-Network
DDQN	Double Deep Q-Network
DNO-RL	Dynamic Noise Optimization via Reinforcement Learning
DP	Differential Privacy
GDPR	General Data Protection Regulation
GPS	Global Positioning System
ITS	Intelligent Transportation Systems
LDP	Local Differential Privacy
MDP	Markov Decision Process
R-DP	Rule-Based Dynamic Differential Privacy
RL	Reinforcement Learning
S-DP	Static Differential Privacy
SGD	Stochastic Gradient Descent
VANETs	Vehicular Ad Hoc Networks

References

Kousaridas, A.; Fallgren, M.; Fischer, E.; Moscatelli, F.; Vilalta, R.; Mühleisen, M.; Barmpounakis, S.; Vilajosana, X.; Euler, S.; Tossou, B.; et al. 5g vehicle-to-everything services in cross-border environments: Standardization and challenges. IEEE Commun. Stand. Mag. 2021, 5, 22–30. [Google Scholar] [CrossRef]
Chiha, A.; Vannieuwenborg, F.; Denis, B.; Colle, D.; Verbrugge, S. Cooperative, connected and automated mobility (ccam) services provisioning in cross-border settings: Techno-economic analysis in the light of technical challenges. Transp. Policy 2023, 140, 68–84. [Google Scholar] [CrossRef]
Tahir, S.; Tahir, W. Legal challenges in cross-border data transfers: Balancing security and privacy in a globalized world. Mayo RC J. Commun. Sustain. World 2024, 1, 1. [Google Scholar]
Brambilla, M.; Nicoli, M.; Soatti, G.; Deflorio, F. Augmenting vehicle localization by cooperative sensing of the driving environment: Insight on data association in urban traffic scenarios. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1646–1663. [Google Scholar] [CrossRef]
Chen, C.; Hu, X.; Li, Y.; Tang, Q. Optimization of privacy budget allocation in differential privacy-based public transit trajectory data publishing for smart mobility applications. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15158–15168. [Google Scholar] [CrossRef]
Song, S.; Chaudhuri, K.; Sarwate, A.D. Stochastic gradient descent with differentially private updates. In Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing, Austin, TX, USA, 3–5 December 2013; pp. 245–248. [Google Scholar]
Arif, M.; Chen, J.; Wang, G.; Geman, O.; Balas, V.E. Privacy preserving and data publication for vehicular trajectories with differential privacy. Measurement 2021, 173, 108675. [Google Scholar] [CrossRef]
Yang, M.; Wu, Y.; Chen, Y. A k-anonymity optimization algorithm under attack model. In Proceedings of the 2022 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing Communications (GreenCom) and IEEE Cyber, Physical Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), Espoo, Finland, 22–25 August 2022; pp. 357–362. [Google Scholar]
Florin, R.; Olariu, S. Real-time traffic density estimation: Putting on-coming traffic to work. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1374–1383. [Google Scholar] [CrossRef]
Wong, R.Y.; Chong, A.; Aspegren, R.C. Privacy legislation as business risks: How gdpr and ccpa are represented in technology companies’ investment risk disclosures. Proc. ACM Hum.-Comput. Interact. 2023, 7, 1–26. [Google Scholar] [CrossRef]
Al-Hussaeni, K.; Fung, B.C.M.; Iqbal, F.; Dagher, G.G.; Park, E.G. Safepath: Differentially-private publishing of passenger trajectories in transportation systems. Comput. Netw. 2018, 143, 126–139. [Google Scholar] [CrossRef]
Shen, Y.; Shepherd, C.; Ahmed, C.M.; Shen, S.; Wu, X.; Ke, W.; Yu, S. Game-theoretic analytics for privacy preservation in internet of things networks: A survey. Eng. Appl. Artif. Intell. 2024, 133, 108449. [Google Scholar] [CrossRef]
Seeman, J.; Susser, D. Between privacy and utility: On differential privacy in theory and practice. ACM J. Responsib. Comput. 2024, 1, 1–18. [Google Scholar] [CrossRef]
Mesbah, W. Dp with auxiliary information: Gaussian mechanism versus laplacian mechanism. IEEE Open J. Commun. Soc. 2025, 6, 143–153. [Google Scholar] [CrossRef]
Sharma, J.; Kim, D.; Lee, A.; Seo, D. On differential privacy-based framework for enhancing user data privacy in mobile edge computing environment. IEEE Access 2021, 9, 38107–38118. [Google Scholar] [CrossRef]
Erlingsson, Ú.; Pihur, V.; Korolova, A. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
Xu, C.; Luo, L.; Ding, Y.; Zhao, G.; Yu, S. Personalized location privacy protection for location-based services in vehicular networks. IEEE Wirel. Commun. Lett. 2020, 9, 1633–1637. [Google Scholar] [CrossRef]
Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
Wang, H.; Li, Y.; Gao, C.; Wang, G.; Tao, X.; Jin, D. Anonymization and de-anonymization of mobility trajectories: Dissecting the gaps between theory and practice. IEEE Trans. Mob. Comput. 2021, 20, 796–815. [Google Scholar] [CrossRef]
Wang, Y.; Yang, J.; Zhang, J. Differential privacy for weighted network based on probability model. IEEE Access 2020, 8, 80792–80800. [Google Scholar] [CrossRef]
Wu, Y.; Liu, C.; Xie, Z. A privacy preserving incentive mechanism for intelligent transportation systems. In Proceedings of the 2024 Cross Strait Radio Science and Wireless Technology Conference (CSRSWTC), Taipei, Taiwan, 11–14 November 2024; pp. 1–3. [Google Scholar]
Ghane, S.; Jolfaei, A.; Kulik, L.; Ramamohanarao, K.; Puthal, D. Preserving privacy in the internet of connected vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 22, 5018–5027. [Google Scholar] [CrossRef]
Xu, J.; Liu, L.; Zhang, R.; Xie, J.; Duan, Q.; Shi, L. Ifts: A location privacy protection method based on initial and final trajectory segments. IEEE Access 2021, 9, 18112–18122. [Google Scholar] [CrossRef]
Chen, H.; Li, S.; Zhang, Z. A differential privacy based (κ-ψ)-anonymity method for trajectory data publishing. Comput. Mater. Contin. 2020, 65, 2665–2685. [Google Scholar] [CrossRef]
Min, M.; Zhu, H.; Li, S.; Zhang, H.; Xiao, L.; Pan, M.; Han, Z. Semantic adaptive geo-indistinguishability for location privacy protection in mobile networks. IEEE Trans. Veh. Technol. 2024, 73, 9193–9198. [Google Scholar] [CrossRef]
Han, Y.; Wang, M.; Leclercq, L. Leveraging reinforcement learning for dynamic traffic control: A survey and challenges for field implementation. Commun. Transp. Res. 2023, 3, 100104. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction. IEEE Trans. Neural Netw. 1998, 9, 1054. [Google Scholar] [CrossRef]
Li, J.; Zhang, F.; Guo, Y.; Li, S.; Wu, G.; Li, D.; Zhu, H. A privacy-preserving online deep learning algorithm based on differential privacy. In Proceedings of the 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Rio de Janeiro, Brazil, 24–26 May 2023; pp. 559–564. [Google Scholar]
Chen, X.; Zhang, T.; Shen, S.; Zhu, T.; Xiong, P. An optimized differential privacy scheme with reinforcement learning in vanet. Comput. Secur. 2021, 110, 102446. [Google Scholar] [CrossRef]
Duan, Z.; Mahmood, J.; Yang, Y.; Berwo, M.A.; Yassin, A.A.K.A.; Mumtaz Bhutta, M.N.; Chaudhry, S.A. TFPPASV: A three-factor privacy preserving authentication scheme for VANETs. Secur. Commun. Netw. 2022, 2022, 8259927. [Google Scholar] [CrossRef]
Erdemir, E.; Dragotti, P.L.; Gündüz, D. Privacy-aware location sharing with deep reinforcement learning. In Proceedings of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS), Delft, The Netherlands, 9–12 December 2019; pp. 1–6. [Google Scholar]
Zhang, L.; Yan, Y.; Hu, Y. Dynamic flexible scheduling with transportation constraints by multi-agent reinforcement learning. Eng. Appl. Artif. Intell. 2024, 134, 108699. [Google Scholar] [CrossRef]
Ouadrhiri, A.E.; Abdelhadi, A. Differential privacy for deep and federated learning: A survey. IEEE Access 2022, 10, 22359–22380. [Google Scholar] [CrossRef]
Sathish Kumar, G.; Premalatha, K.; Uma Maheshwari, G.; Rajesh Kanna, P.; Vijaya, G.; Nivaashini, M. Differential privacy scheme using laplace mechanism and statistical method computation in deep neural network for privacy preservation. Eng. Appl. Artif. Intell. 2024, 128, 107399. [Google Scholar] [CrossRef]
Gutiérrez, N.; Otero, B.; Rodríguez, E.; Utrera, G.; Mus, S.; Canal, R. A differential privacy protection-based federated deep learning framework to fog-embedded architectures. Eng. Appl. Artif. Intell. 2024, 130, 107689. [Google Scholar] [CrossRef]
Dwork, C. Differential privacy. In Automata, Languages and Programming; Bugliesi, M., Preneel, B., Sassone, V., Wegener, I., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
McSherry, F.; Talwar, K. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), Providence, RI, USA, 20–23 October 2007; pp. 94–103. [Google Scholar]
Laplace, P.S. Memoir on the probability of the causes of events. Stat. Sci. 1986, 1, 364–378. [Google Scholar] [CrossRef]
Naeem, M.; Rizvi, S.T.H.; Coronato, A. A gentle introduction to reinforcement learning and its application in different fields. IEEE Access 2020, 8, 209320–209344. [Google Scholar] [CrossRef]
Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 5064–5078. [Google Scholar] [CrossRef]
Liu, X.; Ren, M.; Yang, Z.; Yan, G.; Guo, Y.; Cheng, L.; Wu, C. A multi-step predictive deep reinforcement learning algorithm for HVAC control systems in smart buildings. Energy 2022, 259, 124857. [Google Scholar] [CrossRef]
Lovejoy, W.S. A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 1991, 28, 47–65. [Google Scholar] [CrossRef]
Feng, T.; Zhang, Z.; Wong, W.-C.; Sun, S.; Sikdar, B. A framework for tradeoff between location privacy preservation and quality of experience in location based services. IEEE Open J. Veh. Technol. 2024, 5, 428–439. [Google Scholar] [CrossRef]
Lu, Z.; Shen, H. A new lower bound of privacy budget for distributed differential privacy. In Proceedings of the 2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Taipei, Taiwan, 18–20 December 2017; pp. 25–32. [Google Scholar]
Shen, X.; Jiang, H.; Chen, Y.; Wang, B.; Gao, L. Pldp-fl: Federated learning with personalized local differential privacy. Entropy 2023, 25, 485. [Google Scholar] [CrossRef]
Zhu, T.; Li, J.; Hu, X.; Xiong, P.; Zhou, W. The dynamic privacy-preserving mechanisms for online dynamic social networks. IEEE Trans. Knowl. Data Eng. 2022, 34, 2962–2974. [Google Scholar] [CrossRef]
Xu, H.; Fan, Z.; Liu, X. Application of personalized federated learning methods to environmental sound classification: A comparative study. Eng. Appl. Artif. Intell. 2024, 135, 108760. [Google Scholar] [CrossRef]
Alebouyeh, Z.; Bidgoly, A.J. Privacy-preserving federated learning compatible with robust aggregators. Eng. Appl. Artif. Intell. 2025, 143, 110078. [Google Scholar] [CrossRef]

Figure 1. Reinforcement-learning-based vehicle privacy protection in cross-border scenarios.

Figure 2. System framework diagram of DNO-RL.

Figure 3. Privacy–utility trade-off analysis under different privacy budgets.

Figure 4. Heatmap of noise scale

λ_{t}

with respect to vehicle density and area type.

Figure 4. Heatmap of noise scale

λ_{t}

with respect to vehicle density and area type.

Figure 5. Privacy protection score and utility score for different privacy budget levels.

Figure 6. Privacy–utility performance comparison of DQN, DDQN, and DNO-RL in sensitive and non-sensitive regions.

Figure 7. Convergence speed comparison of DQN, DDQN, and DNO-RL.

Figure 8. Learning curves of DNO-RL and benchmark algorithms.

Table 1. Parameters of state variables.

State Variable	Symbol	Explanation
Vehicle density	$ρ_{t}$	Current vehicle density in the area (vehicles/km²)
Average speed	$v_{t}$	Average driving speed of vehicles within the area (km/h)
Temporal characteristics	$τ_{t}$	Number of hours in a day, reflecting time patterns
Sensitive area identification	$I_{t}$	Sensitive area identification (1 for sensitive area, 0 for regular area)
Privacy level	$ε_{t}$	Current differential privacy parameter value
Utility score	$u_{t}$	Current data utility evaluation score

Table 2. Main symbol definitions.

Parameters	Define
$ε$	Privacy budget
$Δ f$	Sensitivity
$γ$	Discount factor
$Δ ε_{i}$	Privacy parameter tuning volume
$Δ s_{i}$	Sensitivity modulus
$λ_{i}$	Noise level
$d_{i}$	Vehicle density
$v_{i}$	Average speed
$τ_{i}^{*}$	Time-specificity
$δ_{i}$	Regional sensitivities
$ρ_{i}$	Privacy level
$u_{i}$	Utility score

Table 3. The hyperparameters list.

Parameter Name	Symbol	Default Value	Description
Learning Rate	$α$	0.001	Adam optimizer learning rate
Discount Factor	$γ$	0.99	Future reward discount
Initial Exploration Rate	$ϵ_{0}$	1.0	Initial $ϵ$ -greedy exploration rate
Final Exploration Rate	$ϵ_{\min}$	0.01	Minimum exploration rate
Exploration Decay Rate	$λ_{ϵ}$	0.995	Exploration rate decay
Batch Size	B	64	Training batch size
Total Privacy Budget	$B_{total}$	150.0	Global privacy budget
Budget Decay Rate	$λ_{B}$	0.98	Budget allocation decay rate

Table 4. Performance comparison of methods in cross-border scenarios.

Algorithm	Cumulative Privacy Budget ( $ε$ )	Budget/ Privacy Budget (m)	Communication Overhead (Count)
S-DP [44]	151.2	33.37	144
Traditional LDP [45]	151.2	63.68	168
Rule-based Dynamic [46]	136.8	114.15	112
DNO-RL	124.075	45.67	96

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, G.; Liu, X.; Zheng, Y.; Zhang, Z.; Cai, Z. DNO-RL: A Reinforcement-Learning-Based Approach to Dynamic Noise Optimization for Differential Privacy. Electronics 2025, 14, 3122. https://doi.org/10.3390/electronics14153122

AMA Style

Wang G, Liu X, Zheng Y, Zhang Z, Cai Z. DNO-RL: A Reinforcement-Learning-Based Approach to Dynamic Noise Optimization for Differential Privacy. Electronics. 2025; 14(15):3122. https://doi.org/10.3390/electronics14153122

Chicago/Turabian Style

Wang, Guixin, Xiangfei Liu, Yukun Zheng, Zeyu Zhang, and Zhiming Cai. 2025. "DNO-RL: A Reinforcement-Learning-Based Approach to Dynamic Noise Optimization for Differential Privacy" Electronics 14, no. 15: 3122. https://doi.org/10.3390/electronics14153122

APA Style

Wang, G., Liu, X., Zheng, Y., Zhang, Z., & Cai, Z. (2025). DNO-RL: A Reinforcement-Learning-Based Approach to Dynamic Noise Optimization for Differential Privacy. Electronics, 14(15), 3122. https://doi.org/10.3390/electronics14153122

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DNO-RL: A Reinforcement-Learning-Based Approach to Dynamic Noise Optimization for Differential Privacy

Abstract

1. Introduction

2. Related Work

2.1. Differential Privacy and Its Extensions

2.2. State-of-the-Art Research on Privacy Protection in Cross-Border Scenarios

2.3. Application of Reinforcement Learning to Privacy Budget Optimization

3. Related Definitions

3.1. Differential Privacy

3.2. Reinforcement Learning

4. Methodology

4.1. Problem Definition and Mathematical Modeling

4.2. Dynamic Noise Optimization Algorithm Framework

4.3. DNO-RL Algorithm Flow

5. Experiments

5.1. Experimental Environment

5.1.1. Performance Evaluation Experiment

5.1.2. Noise Validation Experiment

5.1.3. Algorithm Comparison Experiments

5.2. Introduction of the Dataset

5.3. Comparison Methods and Evaluation Metrics

5.3.1. Baseline Methods

5.3.2. Evaluation Metrics

5.3.3. Experimental Parameterization

5.4. Analysis of Experimental Results

5.4.1. Analysis of Privacy–Utility Trade-Offs Under Different Privacy Budgets

5.4.2. Noise Scale Analysis for Different Areas and Densities

5.4.3. Performance Comparison of Different Algorithms in Cross-Border Scenarios

5.4.4. Comparison Experiments with Different Algorithms

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI