A TDDPG-Based Joint Optimization Method for Hybrid RIS-Assisted Vehicular Integrated Sensing and Communication

Wang, Xinren; Xu, Zhuoran; Wang, Qin; Ni, Yiyang; Zhao, Haitao

doi:10.3390/electronics14152992

Open AccessArticle

A TDDPG-Based Joint Optimization Method for Hybrid RIS-Assisted Vehicular Integrated Sensing and Communication

by

Xinren Wang

^1,2

,

Zhuoran Xu

^2,3,

Qin Wang

^2,3

,

Yiyang Ni

^2,3 and

Haitao Zhao

^2,4,*

¹

Portland Institute, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

²

Institute of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

³

School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

⁴

School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(15), 2992; https://doi.org/10.3390/electronics14152992

Submission received: 20 June 2025 / Revised: 22 July 2025 / Accepted: 24 July 2025 / Published: 27 July 2025

(This article belongs to the Section Microwave and Wireless Communications)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a novel Twin Delayed Deep Deterministic Policy Gradient (TDDPG)-based joint optimization algorithm for hybrid reconfigurable intelligent surface (RIS)-assisted integrated sensing and communication (ISAC) systems in Internet of Vehicles (IoV) scenarios. The proposed system model achieves deep integration of sensing and communication by superimposing the communication and sensing signals within the same waveform. To decouple the complex joint design problem, a dual-DDPG architecture is introduced, in which one agent optimizes the transmit beamforming vector and the other adjusts the RIS phase shift matrix. Both agents share a unified reward function that comprehensively considers multi-user interference (MUI), total transmit power, RIS noise power, and sensing accuracy via the CRLB constraint. Simulation results demonstrate that the proposed TDDPG algorithm significantly outperforms conventional DDPG in terms of sum rate and interference suppression. Moreover, the adoption of a hybrid RIS enables an effective trade-off between communication performance and system energy efficiency, highlighting its practical deployment potential in dynamic IoV environments.

Keywords:

integrated sensing and communication (ISAC); internet of vehicles (IoV); hybrid reconfigurable intelligent surface (RIS); deep reinforcement learning (DRL); TDDPG

1. Introduction

With the continuous development of the Internet of Vehicles (IoV), integrated sensing and communication (ISAC) technology has garnered significant attention, highlighting its growing importance in assisting vehicular communication. Among various enabling technologies, reconfigurable intelligent surfaces (RISs) have emerged as a promising solution because of their ability to enhance signal transmission flexibly. By dynamically controlling the propagation of electromagnetic waves, RISs can achieve broader coverage and improved beamforming capabilities, making it a key technology for 6G-enabled vehicular networks [1,2,3]. In particular, hybrid RISs, which integrate both active and passive reflective elements, offer additional flexibility and efficiency in complex communication environments, providing crucial technological support for the development of smart cities and intelligent transportation systems [4,5,6].

Many scholars have conducted extensive research on RIS technology. For example, Wu et al. explored the use of RISs to reconfigure wireless environments to enhance communication efficiency [7]. Nemati et al. investigated the advantages of RISs in improving coverage in millimeter-wave cellular networks [8]. Furthermore, in the context of integrating RISs with MIMO systems to enhance vehicular network performance, Chen et al. focused on QoS-driven spectrum-sharing strategies for RIS-assisted vehicular networks, providing both theoretical and practical foundations for the application of RISs in vehicular communications [9].

Moreover, the application of RIS technology has substantially improved localization accuracy, enabling more precise position and direction estimation, which is crucial for vehicular network applications [10]. The combination of RIS technology with wireless energy harvesting networks through backscatter communication has demonstrated outstanding potential for energy-efficient communications [11]. Furthermore, an innovative dual-RIS architecture has been proposed to optimize MIMO communication systems under line-of-sight (LoS) conditions, significantly enhancing system capacity and scalability [12].

Recent studies have further explored the challenges and applications of RIS technology. Research has focused on the integration of RISs into vehicular communication networks (V2X), optimizing resource allocation, and addressing trust and interference issues in dense vehicular networks [13]. The application of hybrid RIS-assisted communication systems in device-to-device (D2D) communications has also gained attention, where performance is further enhanced through active element selection algorithms [14].

Performance analysis of Large Intelligent Surface (LIS) technology has demonstrated its significant advantages in terms of asymptotic data rates and channel stability [15]. In RIS-assisted multi-user MIMO downlink communication, beamforming optimization techniques have shown great potential in mitigating interference and improving communication quality [16]. Meanwhile, researchers have developed more advanced channel estimation techniques, such as a two-stage channel estimation method considering the Doppler effect, which significantly enhances estimation accuracy in hybrid RIS-assisted MIMO systems [17,18]. Moreover, RIS-specific hardware architectures that utilize a minimal number of active elements have been proposed to facilitate more efficient channel estimation processes and improve overall system performance [19,20,21]. Lastly, the prospects of intelligent reflecting radio technology have been further emphasized, underscoring its potential in future wireless environments [13,22,23,24,25].

In the field of vehicular networks, the rapid growth of connected devices and mobile services has led to increasingly congested spectrum resources. High-frequency signal transmission suffers from significant path loss, and non-line-of-sight (NLOS) paths are typically weak, while line-of-sight (LOS) paths are prone to blockage by environmental objects [26,27]. As a result, the performance of ISAC systems is highly dependent on the propagation environment, leading to coverage limitations and multiple blind spots. RIS technology offers a promising solution by enabling programmable control over incident signals, including amplitude, phase, polarization, focusing, and attenuation. By intelligently shaping the spatial electromagnetic environment and optimizing signal transmission paths, RISs can mitigate inter-user interference, enhance coverage, and improve spectral and energy efficiency, effectively addressing key challenges in ISAC systems [28]. However, in active RIS-assisted communication systems, the complexity and variability of external environments, along with the noise generated by active RIS elements, can significantly degrade communication link quality, ultimately reducing system performance [29,30,31].

To tackle this issue, this study aims to develop a hybrid RIS adaptive resource allocation framework specifically designed for ISAC-enabled vehicular networks. The proposed framework optimizes RIS phase configuration and beamforming strategies to minimize multi-user interference (MUI), thereby maximizing communication efficiency and reliability. The findings of this study provide a theoretical foundation for deploying RIS technology in next-generation 6G vehicular networks, addressing critical challenges related to communication reliability and efficiency. To this end, a joint optimization algorithm based on a Twin Delayed Deep Deterministic Policy Gradient (TDDPG) is proposed. This optimization problem considers both communication performance and sensing accuracy while adhering to practical constraints such as total power budget, RIS noise power limitations, and Cramér–Rao Lower Bound (CRLB) constraints. The main contributions of this paper are as follows:

TDDPG-Based Joint Optimization Framework: A novel TDDPG-based algorithm is designed to jointly optimize the transmit beamforming and RIS phase shift matrix. Two coordinated DDPG agents are employed to handle the coupling between communication and sensing tasks. By sharing a unified reward function that incorporates multi-user interference (MUI), total power constraints, and the Cramér–Rao Lower Bound (CRLB) for sensing, the framework ensures balanced performance improvements in both communication and radar functionalities.
Hybrid RIS Architecture for Efficiency–Performance Tradeoff: This work adopts a hybrid RIS architecture combining active and passive elements to improve energy efficiency and reduce hardware complexity. The proposed system can dynamically adjust the proportion of active RIS units to achieve a balance between communication performance and cost, making it suitable for real-world vehicular scenarios.
Robustness in Dynamic Vehicular Environments: The TDDPG algorithm demonstrates faster convergence and better adaptability compared to conventional DDPG methods, especially in dynamic and interference-rich environments. Simulation results confirm that the proposed method achieves lower average MUI and higher communication sum rates while also meeting radar sensing constraints under varying network conditions.

The remainder of this paper is organized as follows: Section 2 provides an overview of the system model and problem formulation. Section 3 introduces the design and solution of the hybrid RIS-assisted ISAC optimization problem based on the Twin Delayed Deep Deterministic Policy Gradient (TDDPG) algorithm. Section 4 presents the experimental results and analysis. Finally, Section 5 concludes the study and discusses potential future research directions.

2. System Model and Problem Formulation

In this paper, we consider the hybrid RIS-assisted IoV ISAC scenario, as shown in Figure 1. The direct links between the Dual-Functional Base Station (DFBS) and U targets are blocked by multiple buildings. To overcome this problem, two hybrid RISs are deployed around the building in this scenario to establish a multi-reflection signal transmission path between the DFBS and the target. In addition, the hybrid RIS controller can intelligently adjust the phase shift of the reflected signal, which in turn optimizes the communication quality to ensure accurate and efficient information transmission.

In this paper, we consider the downlink transmission link from DFBS to the target vehicle. Let

c = {[c_{1}, \dots, c_{N}]}^{T} \in C^{N}

and

r = {[r_{1}, \dots, r_{L}]}^{T} \in C^{L}

denote the discrete-time complex baseband signals for communication and radar sensing, respectively. Assuming that these signals are uncorrelated with each other and that the power magnitude is unit power, DFBS precodes c using the communication beamformer

B = [b_{1}, \dots, r_{N}]

and precodes r using the sensing beamformer

Q = [q_{1}, \dots, q_{L}]

, and then the overall downlink transmitted signal

X

is the superposition of the communication and radar signals, denoted by

X = Bc + Qr,

(1)

and the corresponding transmission covariance matrix is denoted as

R = E [{XX}^{H}]

, and this signal reaches the target vehicle through the reflection of the hybrid RIS.

In this paper, we consider the deployment of a square hybrid RIS with M elements, where

ω = {[ω_{1}, \dots, ω_{M}]}^{T} \in C^{M}

denotes the RIS coefficient. Without loss of generality, this paper assumes that the first K reflecting cells of the RIS are active and their maximum amplifier gain is

ρ

. In this paper, the index sets of active and passive reflection units are denoted as

A = \{1, \dots, K\}

and

P = \{K + 1, \dots, N\}

, respectively. Thus, when

i \in A

, we get

|ω_{i}| \leq ρ

; when

i \in P

, we get

|ω_{i}| = 1

. Let

J_{A} \in C^{M \times M}

denote the selection matrix of the identity matrix before K rows and

J_{P} \in C^{M \times M}

denote the selection matrix of the identity matrix after

M - K

rows. When the input signal is

X_{in} \in C^{M}

, the output of the hybrid RIS is

X_{out} = diag (ω) X_{in} + J_{A} diag (ω) n,

(2)

where n is the RIS noise caused by the active component. Furthermore, the total output power of the active reflecting element can be expressed as

P_{r i s} = E [{∥J_{a} diag (ω) (X_{in} + n)∥}^{2}] .

(3)

In the multi-reflection signal path model, this paper considers the presence of two mixed RISs, and the reflection amplitude is set to 1 to maximize the power of the RIS reflected signal so as to facilitate hardware implementation.

Let

H_{b, 1}

,

H_{1, 2}

, and

H_{2, U}

be the channel matrix models from the base station to RIS 1, from RIS 1 to RIS 2, and from RIS 2 to the target vehicle. Only the LoS channel between adjacent nodes is considered in the model, and the LoS channel is modeled as the product of the array responses on both sides. For ULA at DFBS, the array response

a_{B} (θ_{b, 1}) \in C^{T \times 1}

is denoted as

{(a_{B} (θ_{b, 1}))}_{t} = e^{- j 2 π (n - 1) δ sin θ_{b, 1} / λ}

, where T is the number of antennas,

δ

is the normalized antenna spacing,

λ

is the carrier wavelength, and

θ_{b, 1}

represents the azimuth angle of the Angle of Departure (AOD) from DFBS to RIS 1. The array response

a_{I} (θ^{a}, θ^{e}) \in C^{M \times 1}

at each hybrid RIS is denoted by

{(a_{I} (θ^{a}, θ^{e}))}_{m} = e^{- j 2 π d (⌊m - 1 / \sqrt{M}⌋ sin θ^{e} cos θ^{a} + (m - 1 - ⌊m - 1 / \sqrt{M}⌋) cos θ^{e}) / λ}

, where

θ^{a}

and

θ^{e}

denote its Angle of Arrival (AOA) and azimuth and elevation angles of the AOD, respectively, and d is the reflector element spacing. Therefore, according to the position

W

of the target vehicle, define

φ_{b, 1}^{a}

and

φ_{b, 1}^{e}

as the azimuth and elevation angles of the AOA from DFBS to RIS 1, define

θ_{1, 2}^{a}

and

θ_{1, 2}^{e}

as the azimuth and elevation angles of the AOD from RIS 1 to RIS 2, and define

φ_{1, 2}^{a}

and

φ_{1, 2}^{e}

as the azimuth and elevation angles of the AOA from RIS 1 to RIS 2. Define

θ_{2, u}^{a}

and

θ_{2, u}^{e}

as the azimuth and elevation angles of the AOD from RIS 2 to the target vehicle u.

Let

d_{b, 1}

denote the distance from BS to RIS 1,

d_{1, 2}

the distance from RIS 1 to RIS 2, and

d_{2, u}

the distance from RIS 2 to the target vehicle. We assume that in the vehicle ISAC system based on RIS, H represents the wireless channel response matrix between different nodes (base station, RIS, vehicle). Specifically, H describes the propagation gain and phase offset of the line-of-sight (LoS) link between each pair of nodes, which is modeled as the product of the transmit and receive array response vectors. Y represents the signal matrix received by the target vehicle, reflecting the result of signal transmission through the RIS-assisted channel. Specifically, Y contains the symbols to be transmitted, interference, and noise. Then, the channel

H_{b, 1}

from BS to RIS 1 is denoted by

H_{b, 1} = \frac{\sqrt{β}}{d_{b, 1}} e^{- \frac{j 2 π d_{b, 1}}{λ}} a_{I} (φ_{b, 1}^{a}, φ_{b, 1}^{e}) a_{B} {(θ_{b, 1})}^{H}

(4)

where

β

represents the LoS path gain at a reference distance of 1 m. The channel

H_{1, 2}

from RIS 1 to RIS 2 is denoted by

H_{1, 2} = \frac{\sqrt{β}}{d_{1, 2}} e^{- \frac{j 2 π d_{1, 2}}{λ}} a_{I} (φ_{1, 2}^{a}, φ_{1, 2}^{e}) a_{I} {(θ_{1, 2}^{a}, θ_{1, 2}^{e})}^{H}

(5)

The channel

H_{2, U}

from RIS 2 to the target vehicle is denoted by

{(H_{2, U})}_{u} = \frac{\sqrt{β}}{d_{2, u}} e^{- \frac{j 2 π d_{2, u}}{λ}} a_{I} {(θ_{2, u}^{a}, θ_{2, u}^{e})}^{H} .

(6)

According to the above channel expression, the output signal at RIS 1 can be expressed as

Y_{1} = diag (ω) H_{b, 1} X + J_{a} diag (ω) n .

(7)

The output signal at RIS 2 can be expressed as

Y_{2} = diag (ω) H_{1, 2} Y_{1} + J_{a} diag (ω) n .

(8)

By expressing the desired symbol matrix at the target vehicle as

S

, the received signal matrix can be rewritten as

Y = S + (H_{2, U} Y_{2} - S) + N,

(9)

where

N = {[n_{1}, \dots, n_{U}]}^{T} \in C^{U \times M}

represents the noise at the target vehicle and the second term represents the multi-user interference (MUI). Then, the total energy of the MUI can be expressed as

P_{MUI} = {∥H_{2, U} Y_{2} - S∥}_{F}^{2} .

(10)

The SINR of the u-th user can be expressed as

γ_{u} = \frac{E ({|s_{2, u}|}^{2})}{E ({|h_{2, u}^{T} Y_{2, u} - s_{2, u}|}^{2}) + σ^{2}},

(11)

where

h_{2, u}

denotes the u-th row of

H_{2, U}

and

E (\cdot)

denotes the expectation. Then, the achievable sum rate of the system can be expressed as

R = \sum_{u = 1}^{U} {log}_{2} (1 + γ_{u}) .

(12)

The above equation shows that the achievable data rate of user u can be maximized by minimizing the received interference of user u, so minimizing the MUI energy is closely related to maximizing the sum rate. Meanwhile, the MUI energy is also closely related to the MSE between the received signal and the desired signal. The MSE is defined as

MSE = \frac{1}{M} {∥S - H_{2, U} Y_{2}∥}_{F}^{2} + U σ^{2} .

(13)

Therefore, this paper optimizes the sum rate and MUI of the communication system by maximizing the MSE.

For radar systems, a fundamental function is to estimate the AOD of the target. In this paper, radar is considered to estimate the AOD of a target located at Angle

φ

. The signal received by the radar can be expressed as

Z = ξ a (φ) a {(φ)}^{T} X + N_{r},

(14)

where

ξ

is the reflection coefficient of the target, the Radar Cross Section (RCS), and

a (φ) = {[1, e^{j 2 π δ sin (φ)}, \dots, e^{j 2 π δ (T - 1) sin (φ)}]}^{T} \in C^{T \times 1}

is the antenna steering vector.

N_{r}

is the Additive White Gaussian Noise (AWGN) with zero mean and covariance matrix

R_{N}

.

Therefore, the parameters to be estimated for the received signals can be expressed as

θ = [\begin{matrix} Re \{ξ\} \\ Im \{ξ\} \\ φ \end{matrix}],

(15)

The Fisher information matrix (

FIM

) of it has been calculated as [32]

J (ξ, φ) = [\begin{matrix} 2 u & 0 & 2 Re {ξ^{*} u} \\ 0 & 2 u & 2 Im {ξ^{*} u} \\ 2 Re {ξ^{*} u} & 2 Im {ξ^{*} u} & {2 | ξ |}^{2} u \end{matrix}],

(16)

where

u = Tr [\hat{a} {(φ)}^{H} Q a (φ)]

,

Q = X^{H} R_{N}^{- 1} X

, and

\hat{a} (φ)

denotes the derivative of the steering vector, which can be expressed as

\hat{a} (φ) = {[0, j 2 π δ a_{2} cos (φ), \dots, j 2 π δ (t - 1) a_{L} cos (φ)]}^{T}

, where

a_{i}

denotes the i-th element of

a (φ)

.

The Cramér–Rao Lower Bound (CRLB) is one of the determinants of the performance of radar communication systems, which can provide a lower bound for MSE. Then, when the target is located at direction

φ

, the CRLB of the arrival angle is

{[J^{- 1}]}_{3, 3}

, which can be expressed as

CRLB = \frac{1}{2 {|ξ|}^{2}} [Tr (X^{H} {[\hat{a} (φ) a {(φ)}^{T} + a (φ) \hat{a} {(φ)}^{T}]}^{H} R_{N}^{- 1} [\hat{a} (φ) a {(φ)}^{T} + a (φ) \hat{a} {(φ)}^{T}] X)] .

(17)

To accurately estimate the AOD of the target, the CRLB needs to be less than A given threshold

η

:

\frac{1}{2 {|ξ|}^{2}} [Tr (X^{H} {[\hat{a} (φ) a {(φ)}^{T} + a (φ) \hat{a} {(φ)}^{T}]}^{H} R_{N}^{- 1} [\hat{a} (φ) a {(φ)}^{T} + a (φ) \hat{a} {(φ)}^{T}] X)] < η

(18)

Let

A = \hat{a} (φ) a {(φ)}^{T} + a (φ) \hat{a} {(φ)}^{T}

. Then, Equation (18) can be rewritten as

[Tr (X^{H} A^{H} R_{N}^{- 1} A X)] \leq 2 η {|ξ|}^{2}

(19)

3. Design and Solution of ISAC Optimization Problem Assisted by Hybrid RIS in Internet of Vehicles

The use of active components in the RIS results in additional noise when the RIS output signal is transmitted to the target vehicle, which adversely affects the radar and communication SNR. In this paper, the optimal value of the phase shift of each RIS calculated by the base station is sent to the intelligent controller of the RIS under the constraints of total transmit power, RIS noise power, and CRLB of signal arrival estimation. At the same time, an appropriate beamformer is selected to minimize the MSE of system communication while ensuring the minimum SNR of communication to suppress RIS noise. The optimization problem can be formulated as

\begin{matrix} min_{S, C, ω} & {∥H_{2, u} Y_{2} - S∥}_{F}^{2} \\ s . t . & R 1 : |x_{n, m}| = \sqrt{P / t}, \\ R 2 : Tr (R) \leq P_{t}, \\ R 3 : P_{r i s} \leq P_{\max}, \\ R 4 : [Tr (X^{H} A^{H} R_{N}^{- 1} A X)] \leq 2 η {|ξ|}^{2}, \end{matrix}

(20)

where

x_{n, m}

represents the

(m, n)

-th element of

X

, constraint

R 1

constricts the ISAC waveform to constant modulus,

R 2

is the total transmit power constraint at DFBS,

R 3

is the total power constraint at RIS, and

R 4

is the CRLB constraint to accurately estimate the target AOD.

To cope with the tight coupling between beamforming and RIS phase shift matrix design in RIS-assisted communication, this paper proposes a novel joint optimization method based on the TDDPG architecture. Two DDPG structures are cleverly employed in the TDDPG structure, one of which is responsible for the output RIS phase shift, and the other is focused on the generation of the joint beamforming vector. Such a design aims to effectively decouple the design process of the RIS phase shift and the joint beamforming vector and simplify the optimization task. In addition, the two DDPG architectures work together by sharing a common reward function, which can guide the agent to adjust the policy of its output action, ensuring that the output RIS phase shift and beamforming vector can significantly reduce the MSE of the system, thereby improving the communication performance.

3.1. DDPG Method

Traditional RL methods perform well when dealing with tasks with small and discrete action and sample spaces. However, traditional RL methods often struggle to cope with complex tasks such as RIS-assisted UAV communication, especially when the state space is huge and the action space is continuous, especially when the input data consists of high-dimensional images. In order to solve this problem, the DeepMind team introduced DQN, which uses a Deep Neural Network (DNN) to approximate the Q-value function in Q-learning, combining the advantages of deep learning and reinforcement learning. The DDPG algorithm is further innovative on the basis of DQN, which combines the deterministic policy gradient (PG) algorithm and the experience replay pool and target network technology in DQN. DDPG adopts the Actor–Critic architecture, which can directly output deterministic actions, successfully solves the continuous control problem, and makes the DRL method better adapted to the needs of the wireless communication field.

In a single DDPG algorithm, four neural networks are employed to model different functions. Specifically, Actor network

π (s | θ^{π})

is a parameterized network that is responsible for expressing behavior policies; the network

Q (s, a | θ^{Q})

of the Critic, however, is another parameterized network that predicts the long-term reward obtained by taking a particular action in the current state, the Q-value. In addition, to cope with the possible bootstrapping problem during the parameter update process, the concept of the target network is introduced. Among them, the Target Actor network

π^{'} (s | θ^{π^{'}})

and the Target Critic network

Q^{'} (s, a | θ^{Q^{'}})

are copies of the Actor network and the Critic network, respectively. Their existence helps to stabilize the learning process and prevent the network from violent oscillation during the training process, so as to ensure the smooth progress of the learning process.

First, the agent takes state

s_{i}

from the environment and executes the action

a_{i}

produced by the Actor network; then, it obtains an immediate reward

r_{i}

and moves on to the next state

s_{i + 1}

. The experience tuple

(s_{i}, a_{i}, r_{i}, s_{i + 1})

is stored in the experience replay buffer D, and samples with batch size

N_{B}

are randomly selected to train the neural network.

The output of the Actor network is the action performed by the agent. During the training process, the parameters of the Actor network are updated to maximize the cumulative expected reward by gradient ascent, which uses the Adam optimizer to update

θ^{π}

. Specifically, the parameters of the Actor network are updated according to the gradient calculation formula. Given that the DDPG algorithm involves drawing samples from a pool of experience replay, a Monte Carlo method is used to compute the policy gradient. Therefore, the Actor network is trained and updated with parameters by the following policy gradient function, which continuously optimizes its behavior policy:

\nabla_{θ^{π}} J = \frac{1}{N_{B}} \sum_{i = 1}^{N_{B}} \nabla_{a} Q (s_{i}, a_{i} | θ^{Q}) \nabla_{θ^{π}} π (s_{i} | θ^{π})

(21)

where the performance objective function J is designed for the offline policy learning scenario, which can quantitatively evaluate the policy, measure the performance of a policy, and guide the parameter update process of the Actor network to optimize the behavior of the agent. The function is defined as follows:

J = \int_{s} ρ^{π} Q^{π} (s, π (s)) d s

(22)

where s is the state of the environment and

ρ^{π}

is the probability distribution function based on the state generated by policy

π

.

Q^{π} (s, π (s))

is the value of Q resulting from choosing an action according to policy

π

at each state s. The update of the q-value satisfies the Bellman equation, which is expressed as follows:

Q (s, a) = r + γ max_{a^{'}} Q (s^{'}, a^{'})

(23)

This formula describes the Q-value of taking an action a in a given state s, which is the sum of the immediate reward r and the reward for taking an action that leads to the optimal long-term future reward, where the future reward is partially adjusted by a discount factor

γ

. The discount factor

γ

captures the degree of importance of possible future rewards relative to current rewards, and it determines the consideration of the trade-off between future and current rewards in the decision-making process.

In the DDPG algorithm, the experience replay mechanism plays a crucial role. This mechanism builds a memory to store the four-tuple data

(s_{i}, a_{i}, r_{i}, s_{i + 1})

generated by each state transition. During training, the algorithm randomly draws samples from the experience replay pool, which are used to update the network parameters. This random selection method ensures the independence between samples, effectively breaks the correlation between samples, and improves the stability and convergence speed of the algorithm.

The network of the Critic is used to fit

Q (s, a)

and output the Q-value for performing action a in a given state s. The update of the network parameters

θ^{Q}

of the Critic is achieved by minimizing the error between the evaluation value

Q (s, a | θ^{Q})

and the target. Here, the error

L_{c}

can be expressed as follows:

L_{c} = \frac{1}{N_{B}} \sum_{i} {(y_{i} - Q (s_{i}, a_{i} | θ^{π}))}^{2}

(24)

where

y_{i}

can be viewed as the target Q-value and its expression is

y_{i} = r_{i} + γ Q^{'} (s_{i + 1}, π^{'} (s_{i + 1} | θ^{π^{'}}) | θ^{Q^{'}})

(25)

The

y_{i}

is obtained by calculating the Target Actor network

π^{'}

and the Target Critic network

Q^{'}

. This practice makes the Critic network more stable in the process of parameter learning by introducing the target network so that it is easier to converge. This design effectively improves the robustness and learning efficiency of the algorithm.

In the DDPG method, the target network adopts the strategy of soft updates. This soft update method ensures the smooth change in the target network parameters so that the target value calculated by the target network is more stable. This stability is crucial in the learning process of the Critic network. Let the learning rate in the target network update process be

τ

; then, the parameter update process of the target network can be carried out in the following way so as to realize the smooth transition of parameters and the stability of learning:

θ^{π^{'}} = τ θ^{π} + (1 - τ) θ^{π^{'}}

(26)

θ^{Q^{'}} = τ θ^{Q} + (1 - τ) θ^{Q^{'}}

(27)

3.2. TDDPG Method

According to the aforementioned DDPG network architecture and parameter update mechanism, this chapter adopts a novel TDDPG structure. Among them, the first DDPG is specifically responsible for learning the optimization strategy of the RIS phase shift matrix. To reduce the MSE of the system communication, the first DDPG defines the relevant elements, that is, the state

s_{n, 1}

, the action

a_{n, 1}

, and the reward function

r_{n, 1}

in the n-th time interval as follows, respectively:

1.: State $s_{n, 1}$ : The first DDPG is used for RIS phase shift matrix optimization, and the position information $W$ of the target vehicle will be used as the state input of this network;
2.: Behavior $a_{n, 1}$ : This represents the action output by this network, which is the phase shift change in the RIS in the next time interval;
3.: Reward $r_{n, 1}$ : The reward function $r_{n, 1}$ is used as a judgment on the behavior of the agent, and its construction directly determines the quality of the training results. The reward function is constructed as follows:

$r_{n, 1} = P_{M U I} - q_{1} p_{1} - q_{2} p_{2} - q_{3} p_{3}$

(28)

where $p_{1}$ , $p_{2}$ , and $p_{3}$ are the penalty terms when the constraints $R 2$ , $R 3$ , and $R 4$ are not satisfied and $q_{1}$ , $q_{2}$ , and $q_{3}$ are the weight coefficients corresponding to each penalty term, respectively.

In terms of the influence of the p parameter on the function, when the reward signal is linear, there can be a relatively constant limiting force for actions that exceed the constraints. This enables the agent to be forced away from the incorrect strategy early on, and during the learning process, it is manifested as the violation of constraints in the early training period rapidly decreasing. When the reward signal is quadratic or logarithmic, during the learning process, it is manifested as being more likely to approach the optimal boundary value in the middle and later stages of training, but in the early stage, it is prone to causing jitter in the Actor.

In the simulation of Section 4 of this article, the value of the parameter p is determined by the following formula:

\{\begin{matrix} p_{1} & = max \{0, Tr (R) - P_{t}\}, \\ p_{2} & = max \{0, P_{RIS} - P_{max}\}, \\ p_{3} & = max \{0, \frac{CRLB}{{2 | ξ |}^{2}} - η\} . \end{matrix}

(29)

To ensure the stability of the training process, the value of the q parameter is usually determined based on experience. Regarding the influence of the q parameter on the function, when q is small, the agent will prioritize maximizing the long-term reward accumulated from optimizing the task, while there will be some relaxation in the constraints, manifested as the transmission power occasionally exceeding the threshold condition, resulting in increased energy consumption or inter-base station interference problems; when q is large, the agent will strictly satisfy the constraints, but in terms of learning speed, it will lag behind the situation with a lower q value, causing a problem of slow convergence, manifested as the communication rate of the system will decrease. One study may prioritize power constraints, while another study may prioritize perception accuracy. Therefore, in different scenarios, different values of q are generally selected based on the core task.

The second DDPG module in the TDDPG structure will be used for beamforming optimization. This module will output the beamforming vector

X

at the transmitter in a given state. The state

s_{n, 2}

, action

a_{n, 2}

, and reward

r_{n, 2}

of the second DDPG at the

n_{n, 2}

-th time interval are defined as follows:

1.: State $s_{n, 2}$ : The cascaded channel $H = [H_{b, 1}, H_{1, 2}, H_{2, U}]$ will be the state input of the second DDPG;
2.: Behavior $a_{n, 2}$ : The behavior $a_{n, 2}$ of the second DDPG output will be defined by the beamforming vector at the transmitter;
3.: Reward $r_{n, 2}$ : Since the purpose of the TDDPG algorithm adopted in this paper is to jointly optimize the beamforming matrix at the transmitter and the RIS phase shift coefficient to minimize the system communication MSE, the second DDPG module shares the same reward function with the first DDPG module, that is, the reward function in Equation (28), so as to improve the communication efficiency.

The structure of the TDDPG algorithm is shown in Figure 2 as follow.

The procedure of the TDDPG algorithm is shown in Algorithm 1.

Algorithm 1: TDDPG algorithm

1:: Initialization: Initializing Actor network $π_{1} (s | θ_{1}^{π})$ , Critic network $Q_{1} (s, a | θ_{1}^{Q})$ , Target Actor network $π_{1}^{'} (s | θ_{1}^{π^{'}})$ , and Target Critic network $Q_{1}^{'} (s, a | θ_{1}^{Q^{'}})$ of the first DDPG in TDDPG; Initializing Actor network $π_{2} (s | θ_{2}^{π})$ , Critic network $Q_{2} (s, a | θ_{2}^{Q})$ , Target Actor network $π_{2}^{'} (s | θ_{2}^{π^{'}})$ , and Target Critic network $Q_{2}^{'} (s, a | θ_{2}^{Q^{'}})$ of the second DDPG in TDDPG;
2:: for $n_{e p} = 1, 2, \dots, N_{e p}$ do
3:: Reset the RIS phase shift matrix and the target vehicle position;
4:: for $n_{s t e p} = 1, 2, \dots, N_{s t e p}$ do
5:: The location $W$ of the target vehicle is set as state $s_{n, 1}$ , and the cascade channel $H$ is set as state $s_{n, 2}$ .
6:: Select the corresponding actions $a_{n, 1}$ and $a_{n, 2}$ .
7:: Calculate the immediate rewards $r_{n, 1}$ and $r_{n, 2}$ for taking actions $a_{n, 1}$ and $a_{n, 2}$ according to Equation (27);
8:: Calculate the transition states $s_{n + 1, 1}$ and $s_{n + 1, 2}$ ;
9:: The state transition data $(s_{n, 1}, a_{n, 1}, r_{n, 1}, s_{n + 1}, 1)$ and $(s_{n, 2}, a_{n, 2}, r_{n, 2}, s_{n + 1}, 2)$ are stored in the experience replay pool $D$ ;
10:: Take $N_{B}$ state transition data from experience replay pool $D$ and update $θ_{1}^{π}$ , $θ_{2}^{π}$ , $θ_{1}^{Q}$ , $θ_{2}^{Q}$ ;
11:: Update $θ_{1}^{π^{'}}$ , $θ_{2}^{π^{'}}$ , $θ_{1}^{Q^{'}}$ , $θ_{2}^{Q^{'}}$ ;
12:: end for
13:: end for

The TDDPG algorithm proposed in this chapter inputs the position information

W

and the cascade channel

H

into the DDPG network as two independent states so that the algorithm could output the RIS phase shift matrix and the beamforming strategy, respectively, to successfully decompose the complex optimization problem into two independent sub-problems for solving. In this process, the TDDPG realizes the cooperation of two DDPG agents by introducing a shared reward function and wireless environment information, and each agent learns and optimizes the phase shift matrix and beamforming strategy to better cope with the dynamic changes in the wireless environment and achieve more accurate and efficient optimization effects.

3.3. Analysis of Algorithm Convergence and Expansion

In this subsection, we examine the engineering feasibility of the TDDPG algorithm proposed in this paper. The number of floating-point operations required for each execution of the TDDPG algorithm can be expressed as

T_{step} = 4 (L - 1) H^{2} + 4 H (N_{s} + N_{a}) + 2 H,

(30)

where

N_{s}

is the length of the state vector,

N_{a}

is the length of the action vector, H is the number of units in each hidden layer, and L is the number of hidden layers. Therefore, the time complexity of this algorithm is

O (n^{2})

, and the number of floating-point operations required for each update is approximately

10^{5}

MACs

. Assuming that a computing platform with a computing power of 1

GFLOPS

is deployed at the base station, the time required for one calculation is approximately 0.8

ms

. This can fully meet the performance requirements of a scenario with a vehicle speed of 120

km / h

and a coherence time of 3–5

ms

between RIS-BS links, and it has practical significance for engineering deployment.

This algorithm can also be extended to scenarios with higher user density and random mobility models. The original state

s_{t}

only contains the geometric and channel information of a single user’s straight-line path. For a large number U of random trajectories of users in complex scenarios, the original state can be rewritten as

s_{t} = \{p_{u} (t), v_{u} (t), H_{b, 1} (t), H_{1, 2} (t), H_{2, u} (t)\} |_{u = 1}^{U},

(31)

where

p_{u} (t)

represents the user’s position,

v_{u} (t)

represents the user’s movement state, and H represents the channel state.

Update the multi-user interference power to

P_{MUI} (t) = \sum_{u = 1}^{U} {∥h_{2, u} (t) Y_{2} (t) - s_{u} (t)∥}^{2} .

(32)

Update the reward to

r_{t} = \frac{1}{U} \sum_{u = 1}^{U} P_{MUI, u} - q_{1} p_{1} - q_{2} p_{2} - q_{3} p_{3} .

(33)

For the TDDPG algorithm, the network structure does not need to be modified. As the number of users increases, simply increasing the width of the hidden layers can enable the algorithm to be extended to more complex scenarios.

4. Simulation Results and Analysis

To verify the effectiveness of the proposed model and the joint optimization algorithm based on the TDDPG architecture, a

100 m \times 100 m

IoV scenario is considered in this section, where the DFBS equipped with

T = 30

antennas is located at

(80 m, 70 m)

and the initial positions

W

of the

U = 4

target users are set to

W_{1} = (30 m, 10 m)

,

W_{2} = (30 m, 15 m)

,

W_{3} = (20 m, 5 m)

,

W_{4} = (25 m, 30 m)

and move in a straight line along the x-axis direction. The two hybrid RIS locations are set to

(30 m, 70 m)

and

(70 m, 30 m)

.

For the direct base station–vehicle link, we consider it as an NLoS link. During the optimization process, we basically ignore the direct NLoS channel between the base station and the vehicle, which is equivalent to saying that it is in a deep fading state. If we want to simulate this direct connection, we will set it as Rayleigh fading with extremely low average power. In fact, this means that the channel coefficient between the base station and the user equipment can be derived from a Gaussian random variable with a mean of 0 and a variance of

σ^{2}

, and

σ^{2}

is determined by the large-scale path loss of the NLoS path. Since in the Rayleigh scenario, the average gain of the direct connection is low, the strategy will naturally continue to rely on the RIS path with higher and more stable gains, that is, the strategy automatically excludes the unreliable direct path and focuses on the channel with a stronger deterministic component. Therefore, the consideration of Rayleigh fading under the NLoS path will not affect the convergence result of the algorithm. In the simulation analysis, we use the ideal LoS channel between the station–vehicle and station–station.

In terms of algorithm design, the proposed TDDPG algorithm was compared with the DDPG algorithm, where each DDPG framework deployed two fully connected hidden layers with 20 neurons in the original network and the target network, respectively. The main simulation parameters settings are shown in the Table 1 below.

When optimizing the simulation and learning environment, we adopted a two-stage strategy that balanced theoretical requirements and empirical verification. Firstly, based on the typical hardware conditions of the vehicle networking system, we determined the system-level limitations and adopted a

5 \times 5

RIS array. Then, 10 active RISs were mixed into this array. This choice balanced performance and system complexity well. Increasing the number of array units brought a much smaller performance gain than the deployment cost. The upper limit of base station transmission power

P_{t}

, the maximum total transmission power of RIS end

P_{\max}

, path gain (large-scale fading)

β

, and noise power

σ^{2}

were all selected as commonly used engineering values. The CRLB threshold

η

was set to 0.01, which is a typical level of vehicle-grade radar, meaning the angle estimation error is less than approximately 5.7°. In terms of training parameters, the size of the back buffer

D

was selected as 1600, which retained sufficient sample quantities while shortening the calculation time. The step size

N_{s t e p}

was chosen as 50 in a single step because in the simulation process, choosing a shorter step size would cause severe fluctuations in the training rewards, while choosing a longer step size would lead to large gradient deviations and drift in the update direction. During the Actor–Critic update process, a mini-batch training with 32 per batch was adopted to achieve the best balance between gradient stability and computational load. This parameter was obtained through empirical methods. The number of training rounds was set to 100, as the algorithm converged around 40 rounds in the simulation and choosing 100 rounds can better demonstrate the changes in the entire training curve.

Figure 3 compares the performance of the TDDPG algorithm and the original DDPG algorithm under different RIS phase shift optimization strategies in terms of average MUI. As shown in the figure, although both DDPG and TDDPG algorithms eventually converge, the TDDPG algorithm achieves faster convergence and better final performance, demonstrating stronger stability and adaptability to dynamic environments. Moreover, the continuous phase shift strategy (CPS) consistently outperforms the discrete phase shift strategy (DPS) under both algorithms. Unlike discrete phase shifts, which are limited to a set of fixed values, the continuous strategy enables fine-grained control of the RIS phase shifts across a wider range, allowing more precise manipulation of electromagnetic wave propagation and reflection. This significantly reduces the average interference in the system. Among all configurations, the TDDPG, combined with CPS, achieves the lowest average MUI, indicating the best overall performance.

Figure 4 presents the optimization performance of the TDDPG and original DDPG algorithms under CPS and DPS strategies in terms of communication sum rate. It is evident that the TDDPG algorithm outperforms the traditional DDPG under both CPS and DPS configurations, achieving faster convergence and a higher final sum rate. This demonstrates that the proposed TDDPG architecture more effectively coordinates the joint optimization of transmit beamforming and RIS phase shifts, fully exploiting the system’s performance potential.

Figure 5 shows the performance of hybrid RIS systems with different numbers of active elements under different RIS phase shift optimization strategies. It can be observed that as the number of active elements in the hybrid RIS increases, the system’s sum rate exhibits a steady upward trend. This is because a greater number of active elements provides higher degrees of freedom for fine-grained control over the phase and amplitude of signals, thereby enhancing beamforming capability and improving the quality and efficiency of communication links. However, the increase in active components also leads to higher manufacturing and maintenance costs. Moreover, as the number of active elements grows, the system’s total power consumption and heat dissipation requirements also rise, which may result in decreased energy efficiency. Therefore, in practical deployments, it is necessary to determine an appropriate number of active elements based on the specific application requirements and system constraints.

Figure 6 illustrates the impact of all active RISs and hybrid RISs with different proportions of active elements and all passive RISs on communication performance and sum rate. It can be observed that the fully active RIS achieves the best performance, reaching the highest sum rate, while the fully passive RIS exhibits the lowest performance. The hybrid RIS lies between the two, with its communication performance improving as the number of active elements increases. This indicates that a hybrid RIS offers a favorable trade-off between performance and cost. By appropriately configuring a portion of active elements, the system can significantly enhance its communication performance without substantially increasing complexity and power consumption. At the same time, it achieves considerable reductions in energy consumption and deployment cost, making it more practical for real-world applications. Moreover, the hybrid RIS architecture provides high flexibility, allowing the ratio of active to passive elements to be dynamically adjusted according to specific application scenarios and environmental conditions, thereby adapting to diverse communication requirements.

5. Conclusions

In this paper, a hybrid RIS-assisted communication method is considered for ISAC in IoV scenarios. Due to the additional noise introduced by the active components within the RIS, the SNR of both radar sensing and wireless communication may be adversely affected. This not only interferes with the normal communication process but also significantly degrades signal quality, thereby reducing overall system performance. To address this challenge, a joint optimization problem is formulated for the RIS phase shift matrix and transmit beamforming under constraints such as total transmit power, RIS-induced noise power, and the CRLB of angle estimation. A joint optimization algorithm based on a TDDPG architecture is proposed to tackle the tightly coupled optimization variables. The algorithm combines deep reinforcement learning with optimization theory and iteratively updates the RIS phase shifts and beamforming parameters to gradually approach the optimal solution. The simulation results demonstrate that the proposed TDDPG-based method effectively addresses the complex optimization problem. Compared with the conventional DDPG algorithm, the proposed approach exhibits faster convergence, improved stability, and better adaptability to dynamic environments.

However, this algorithm still has some limitations, especially in complex scenarios. For instance, if we originally trained the system on 20 users, but the peak number of users in the scenario reached 100, then the system performance would inevitably decline. Secondly, in our ISAC design, the state space includes the continuous positions of the vehicles and the various elements of the cascaded channel matrix H, while the action space is composed of the RIS phase shift vector and the beamforming vector for transmission. When the number of active RISs increases, the calculation time will grow exponentially. We will attempt to use new methods such as distributed computing or online learning in future work to solve these problems.

This study proposes an adaptive resource allocation framework for hybrid RISs, aiming to ensure communication reliability and efficiency in ISAC scenarios for the IoV. The findings provide a theoretical basis for RIS deployment in future 6G IoV networks. Future work will focus on extending the proposed optimization framework by integrating machine learning-based interference prediction and adaptive beamforming techniques to further enhance system performance in more complex and dynamic environments. Additionally, real-time hardware implementation and network scalability will be explored to validate the feasibility and practicality of the proposed solution in large-scale, high-density networks.

Author Contributions

Software, X.W. and Z.X.; Formal analysis, X.W. and Z.X.; Writing—original draft, X.W. and Z.X.; Writing—review & editing, Q.W. and Y.N.; Supervision, H.Z.; Project administration, Q.W. and Y.N.; Funding acquisition, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Joint Funds for the National Natural Science Foundation of China (grant number U24B20187), the Natural Science Foundation on Frontier Leading Technology Basic Research Project of Jiangsu (grant number BK20212001), the National Natural Science Foundation of China (grant number 92367302 and 62371250), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number 24KJA510008), and the Natural Science Foundation of Nanjing University of Posts and Telecommunications (grant number NY224113).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chetlur, V.V.; Dhillon, H.S. Coverage and Rate Analysis of Downlink Cellular Vehicle-to-Everything (C-V2X) Communication. IEEE Trans. Wirel. Commun. 2020, 19, 1738–1753. [Google Scholar] [CrossRef]
Saeed, R.A.; Naemat, A.B.H.; Aris, A.B.; Awang, M.K.B. Design and evaluation of lightweight IEEE 802.11p-based TDMA MAC method for road side -to-vehicle communications. In Proceedings of the 2010 the 12th International Conference on Advanced Communication Technology (ICACT), Gangwon-Do, Republic of Korea, 7–10 February 2010; Volume 2, pp. 1483–1488. [Google Scholar] [CrossRef]
Kaiwartya, O.; Abdullah, A.H.; Cao, Y.; Altameem, A.; Prasad, M.; Lin, C.T.; Liu, X. Internet of Vehicles: Motivation, Layered Architecture, Network Model, Challenges, and Future Aspects. IEEE Access 2016, 4, 5356–5373. [Google Scholar] [CrossRef]
Di Renzo, M.; Zappone, A.; Debbah, M.; Alouini, M.S.; Yuen, C.; de Rosny, J.; Tretyakov, S. Smart Radio Environments Empowered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead. IEEE J. Sel. Areas Commun. 2020, 38, 2450–2525. [Google Scholar] [CrossRef]
Dampahalage, D.; Shashika Manosha, K.B.; Rajatheva, N.; Latva-aho, M. Intelligent Reflecting Surface Aided Vehicular Communications. In Proceedings of the 2020 IEEE Globecom Workshops (GC Wkshps 2020), Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
Ning, B.; Chen, Z.; Chen, W.; Du, Y.; Fang, J. Terahertz Multi-User Massive MIMO With Intelligent Reflecting Surface: Beam Training and Hybrid Beamforming. IEEE Trans. Veh. Technol. 2021, 70, 1376–1393. [Google Scholar] [CrossRef]
Wu, Q.; Zhang, R. Towards Smart and Reconfigurable Environment: Intelligent Reflecting Surface Aided Wireless Network. IEEE Commun. Mag. 2020, 58, 106–112. [Google Scholar] [CrossRef]
Nemati, M.; Park, J.; Choi, J. RIS-Assisted Coverage Enhancement in Millimeter-Wave Cellular Networks. IEEE Access 2020, 8, 188171–188185. [Google Scholar] [CrossRef]
Chen, Y.; Wang, Y.; Zhang, J.; Renzo, M.D. QoS-Driven Spectrum Sharing for Reconfigurable Intelligent Surfaces (RISs) Aided Vehicular Networks. IEEE Trans. Wirel. Commun. 2021, 20, 5969–5985. [Google Scholar] [CrossRef]
Elzanaty, A.; Guerra, A.; Guidi, F.; Alouini, M.S. Reconfigurable Intelligent Surfaces for Localization: Position and Orientation Error Bounds. IEEE Trans. Signal Process. 2021, 69, 5386–5402. [Google Scholar] [CrossRef]
Ramezani, P.; Jamalipour, A. Backscatter-Assisted Wireless Powered Communication Networks Empowered by Intelligent Reflecting Surface. IEEE Trans. Veh. Technol. 2021, 70, 11908–11922. [Google Scholar] [CrossRef]
Han, Y.; Zhang, S.; Duan, L.; Zhang, R. Double-IRS Aided MIMO Communication Under LoS Channels: Capacity Maximization and Scaling. IEEE Trans. Commun. 2022, 70, 2820–2837. [Google Scholar] [CrossRef]
Gu, X.; Duan, W.; Zhang, G.; Ji, Y.; Wen, M.; Ho, P.H. Socially Aware V2X Networks With RIS: Joint Resource Optimization. IEEE Trans. Veh. Technol. 2022, 71, 6732–6737. [Google Scholar] [CrossRef]
Mu, G.; Zhang, P.; Hou, Y.; Zhong, S.; Huang, L.; Yuan, T. Efficient Active Elements Selection Algorithm for Hybrid RIS-Assisted D2D Communication System. IEEE Commun. Lett. 2024, 28, 377–381. [Google Scholar] [CrossRef]
Jung, M.; Saad, W.; Jang, Y.; Kong, G.; Choi, S. Performance Analysis of Large Intelligent Surfaces (LISs): Asymptotic Data Rate and Channel Hardening Effects. IEEE Trans. Wirel. Commun. 2020, 19, 2052–2065. [Google Scholar] [CrossRef]
Li, Q.; Wang, S.; Wang, Y. Rank-Two Multicast Beamforming for Multiuser MISO Downlink With Intelligent Reflecting Surface. IEEE Trans. Veh. Technol. 2022, 71, 6722–6726. [Google Scholar] [CrossRef]
Sankar, R.P.; Chepuri, S.P. Beamforming in Hybrid RIS assisted Integrated Sensing and Communication Systems. In Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 29 August–2 September 2022; pp. 1082–1086. [Google Scholar] [CrossRef]
Schroeder, R.; He, J.; Brante, G.; Juntti, M. Two-Stage Channel Estimation for Hybrid RIS Assisted MIMO Systems. IEEE Trans. Commun. 2022, 70, 4793–4806. [Google Scholar] [CrossRef]
Alexandropoulos, G.C.; Vlachos, E. A Hardware Architecture For Reconfigurable Intelligent Surfaces with Minimal Active Elements for Explicit Channel Estimation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 9175–9179. [Google Scholar] [CrossRef]
Liu, K.; Zhang, Z.; Dai, L.; Xu, S.; Yang, F. Active Reconfigurable Intelligent Surface: Fully-Connected or Sub-Connected? IEEE Commun. Lett. 2022, 26, 167–171. [Google Scholar] [CrossRef]
Naderi, S.; da Costa, D.B.; Arslan, H. Joint Random Subcarrier Selection and Channel-Based Artificial Signal Design Aided PLS. IEEE Wirel. Commun. Lett. 2020, 9, 976–980. [Google Scholar] [CrossRef]
Dong, L.; Wang, H.M.; Bai, J.; Xiao, H. Double Intelligent Reflecting Surface for Secure Transmission with Inter-Surface Signal Reflection. IEEE Trans. Veh. Technol. 2021, 70, 2912–2916. [Google Scholar] [CrossRef]
Makarfi, A.U.; Rabie, K.M.; Kaiwartya, O.; Li, X.; Kharel, R. Physical Layer Security in Vehicular Networks with Reconfigurable Intelligent Surfaces. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–6. [Google Scholar] [CrossRef]
Liu, R.; Wu, Q.; Di Renzo, M.; Yuan, Y. A Path to Smart Radio Environments: An Industrial Viewpoint on Reconfigurable Intelligent Surfaces. IEEE Wirel. Commun. 2022, 29, 202–208. [Google Scholar] [CrossRef]
Basharat, S.; Hassan, S.A.; Pervaiz, H.; Mahmood, A.; Ding, Z.; Gidlund, M. Reconfigurable Intelligent Surfaces: Potentials, Applications, and Challenges for 6G Wireless Networks. IEEE Wirel. Commun. 2021, 28, 184–191. [Google Scholar] [CrossRef]
Nguyen, N.T.; Vu, Q.D.; Lee, K.; Juntti, M. Hybrid Relay-Reflecting Intelligent Surface-Assisted Wireless Communications. IEEE Trans. Veh. Technol. 2022, 71, 6228–6244. [Google Scholar] [CrossRef]
Liang, Y.C.; Long, R.; Zhang, Q.; Chen, J.; Cheng, H.V.; Guo, H. Large Intelligent Surface/Antennas (LISA): Making Reflective Radios Smart. J. Commun. Inf. Netw. 2019, 4, 40–50. [Google Scholar] [CrossRef]
Kato, N.; Fadlullah, Z.M.; Tang, F.; Mao, B.; Tani, S.; Okamura, A.; Liu, J. Optimizing Space-Air-Ground Integrated Networks by Artificial Intelligence. IEEE Wirel. Commun. 2019, 26, 140–147. [Google Scholar] [CrossRef]
Huang, C.; Zappone, A.; Alexandropoulos, G.C.; Debbah, M.; Yuen, C. Reconfigurable Intelligent Surfaces for Energy Efficiency in Wireless Communication. IEEE Trans. Wirel. Commun. 2019, 18, 4157–4170. [Google Scholar] [CrossRef]
Haghshenas, M.; Ramezani, P.; Magarini, M.; Björnson, E. Parametric Channel Estimation With Short Pilots in RIS-Assisted Near- and Far-Field Communications. IEEE Trans. Wirel. Commun. 2024, 23, 10366–10382. [Google Scholar] [CrossRef]
Wu, J.; Kim, S.; Shim, B. Near-Field Channel Estimation for RIS-Assisted Wideband Terahertz Systems. In Proceedings of the GLOBECOM 2022-2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 3893–3898. [Google Scholar] [CrossRef]
Kay, S.M. Fundamentals of Statistical Signal Processing: Estimation Theory; Prentice-Hall, Inc.: Saddle River, NJ, USA, 1993. [Google Scholar]

Figure 1. Hybrid RIS-assisted ISAC scenario of IoV model.

Figure 2. The structure of the TDDPG algorithm.

Figure 3. Performance comparison of TDDPG and DDPG algorithms under different phase shift optimization strategies.

Figure 4. The maximum communication sum rates under different perceptual performance constraints.

Figure 5. Performance comparison of hybrid RISs under different active unit ratios.

Figure 6. The maximum communication sum rates under different perceptual performance constraints.

Table 1. Parameters setting.

Parameters	Setting Value
M	25
K	10
$P_{\max} / dBm$	10
$P_{t} / dBm$	30
$σ^{2} / dBm$	−100
U	4
$η$	0.01
$β$	0.01
$D$	1600
$N_{B}$	32
$N_{e p}$	100
$N_{s t e p}$	50

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Xu, Z.; Wang, Q.; Ni, Y.; Zhao, H. A TDDPG-Based Joint Optimization Method for Hybrid RIS-Assisted Vehicular Integrated Sensing and Communication. Electronics 2025, 14, 2992. https://doi.org/10.3390/electronics14152992

AMA Style

Wang X, Xu Z, Wang Q, Ni Y, Zhao H. A TDDPG-Based Joint Optimization Method for Hybrid RIS-Assisted Vehicular Integrated Sensing and Communication. Electronics. 2025; 14(15):2992. https://doi.org/10.3390/electronics14152992

Chicago/Turabian Style

Wang, Xinren, Zhuoran Xu, Qin Wang, Yiyang Ni, and Haitao Zhao. 2025. "A TDDPG-Based Joint Optimization Method for Hybrid RIS-Assisted Vehicular Integrated Sensing and Communication" Electronics 14, no. 15: 2992. https://doi.org/10.3390/electronics14152992

APA Style

Wang, X., Xu, Z., Wang, Q., Ni, Y., & Zhao, H. (2025). A TDDPG-Based Joint Optimization Method for Hybrid RIS-Assisted Vehicular Integrated Sensing and Communication. Electronics, 14(15), 2992. https://doi.org/10.3390/electronics14152992

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A TDDPG-Based Joint Optimization Method for Hybrid RIS-Assisted Vehicular Integrated Sensing and Communication

Abstract

1. Introduction

2. System Model and Problem Formulation

3. Design and Solution of ISAC Optimization Problem Assisted by Hybrid RIS in Internet of Vehicles

3.1. DDPG Method

3.2. TDDPG Method

3.3. Analysis of Algorithm Convergence and Expansion

4. Simulation Results and Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI