Joint Beam Switching and Beam Design for RIS-Assisted Multi-Base Station IoV

Lai, Jinxiang; Wang, Deqing; Zhao, Yifeng

doi:10.3390/app16115399

Open AccessArticle

Joint Beam Switching and Beam Design for RIS-Assisted Multi-Base Station IoV

by

Jinxiang Lai

¹,

Deqing Wang

^2,*

and

Yifeng Zhao

²

¹

College of General Education, Fujian Polytechnic of Water Conservancy and Electric Power, Yongan 366000, China

²

Department of Information and Communication Engineering, Xiamen University, Xiamen 361005, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5399; https://doi.org/10.3390/app16115399 (registering DOI)

Submission received: 23 April 2026 / Revised: 24 May 2026 / Accepted: 26 May 2026 / Published: 28 May 2026

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

With the wide application of artificial intelligence (AI) in the Internet of Vehicles (IoV), IoV is under pressure for data transmission and real-time sensing. Integrated sensing and communication (ISAC) is one of the key technologies to alleviate that pressure. Obstacles can cause communication disruptions and increased delays, hindering autonomous driving information acquisition and causing traffic hazards. The application of Reconfigurable Intelligent Surfaces (RISs) aims to solve this problem. This study focuses on RIS-assisted multi-base station (MBS) scenarios in the presence of obstacles. This study aims to maximize the communication rate, minimize the sensing error, and reduce the switching frequency by optimizing the RIS phase shift and beamforming. The problem is modeled as mixed integer nonlinear programming (MINLP) and further described as a Markov Decision Process (MDP). We use Long Short-Term Memory (LSTM) to predict the environmental state and propose two optimization algorithms, Multi-Factor Decision Deep Deterministic Policy Gradient (MFD-DDPG) and Mixed Discrete and Continuous Action DDPG (MDCA-DDPG). In the first algorithm, we consider multiple factors to make a switching decision and use DDPG to yield the optimal action. The second algorithm improves DDPG by outputting a discrete switching decision and a continuous optimized action simultaneously. Simulations show that the proposed algorithms significantly improve the system performance, and the communication rate is increased by more than 40% in specific multi-vehicle scenarios compared to the benchmark.

Keywords:

internet of vehicle; integrated sensing and communication; beam allocation; RIS reflection matrices; multi-base station switching

1. Introduction

The Internet of Vehicles (IoV) is a key component of the Internet of Things (IoT) in transportation [1]. With the rapid development of IoV, many vehicular applications have been combined with artificial intelligence (AI), such as autonomous driving and vehicular entertainment [2,3]. These combinations leverage the strengths of AI in data processing and real-time decision-making, improve efficiency, safety, and comfort while also providing personalized service. At the same time, the need for efficient perception and reliable communication in complex dynamic scenarios is becoming increasingly urgent. Vehicle-to-everything (V2X) can effectively meet this challenge by providing stable wireless communication services, enabling seamless connectivity across vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), vehicle-to-network (V2N), and vehicle-to-pedestrian (V2P) [4,5]. In the future, with the further integration of IoV and AI, V2X-based IoV systems are poised to become increasingly critical in achieving the goals of intelligent transportation and smart cities [6].

The rapid development of wireless communication technologies provides key technical support for intelligent vehicles in IoT. Vehicles need to realize accurate environment sensing through multi-sensor fusion, and at the same time rely on high-speed communication networks to interact with other traffic participants to build a dynamic sensing system integrating vehicle, road, and cloud. Researchers have increasingly focused on innovative technologies that integrate communication and sensing [7]. Integrated sensing and communication (ISAC) enables simultaneous communication and sensing, thereby enhancing spectrum efficiency and improving overall system performance [8]. Notably, incorporating sensing eliminates the need for pilots and feedback loops typically required in conventional millimeter wave (mmWave) beam tracking schemes, thereby minimizing processing delays [9,10].

In actual application scenarios, communication between the base station (BS) and the vehicle is often obstructed by various obstacles, highlighting the urgent need for relay technologies to reconstruct the communication link. Many researchers have begun exploring relay technologies in IoV [11]. In this field, Reconfigurable Intelligent Surface (RIS), as an innovative passive relay technology, has demonstrated its unique potential. RIS contains numerous passive reflective units that can dynamically modify the trajectory of reflected signals, thereby improving the wireless signal propagation environment [12]. For IoV systems that are highly susceptible to blockage, RIS offers an effective solution to significantly enhance signal quality and expand coverage. Especially in complex urban environments, the application of this technology is expected to greatly improve communication reliability and efficiency [13]. In current research on RIS-assisted ISAC, communication users and sensing targets are often considered as separate entities. The optimization of sensing and communication performance often involves designing the transmit beams for a single BS and the RIS phase shift matrix, or configuring sensing beam patterns to improve the sensing signal-to-noise ratio (SNR) [14,15,16].

As the complexity of IoV environments increases, the coverage and functionality of a single BS become limited. The multi-base station (MBS) system not only significantly expands the coverage area, but also greatly improves the sensing capabilities of the BSs for vehicles and further enhances the overall communication performance of the system [17,18]. When obstacles interrupt BS-vehicle communication, robust mobility management is needed to reduce signal interruptions and costs associated with frequent switching, ensuring stable communication [19]. Attaining seamless switching and lowering switching failure rates during BS changeover are major priorities.

In addition, research has shown that RIS can significantly improve ISAC network performance [20,21]. In particular, in scenarios where line-of-sight (LOS) links are subject to obstruction between BSs and communication users, RIS can help to establish virtual LOS links to support communication [22]. This paper focuses on the RIS-assisted MBS ISAC scenario with the aim of exploring how this technology improves traffic efficiency and safety in urban environments. This technology not only addresses the challenges of signal attenuation and interference encountered by traditional IoV in complex environments but also enhances the stability of communication links and further expands signal coverage. It offers vehicles more stable and efficient communication services. The key contributions of this paper are outlined as follows.

We construct a RIS-assisted MBS ISAC scenario. This optimization problem maximizes the communication rate while minimizing sensing errors and switching counts by jointly optimizing the RIS phase shift and beamforming.

This problem is a mixed integer nonlinear programming (MINLP) problem [23]. We formulate it as a Markov Decision Process (MDP) and propose two algorithms—Multi-Factor Decision Deep Deterministic Policy Gradient (MFD-DDPG) and Mixed Discrete and Continuous Action DDPG (MDCA-DDPG). In the first algorithm, we consider multiple factors to decide whether to switch or not, and use the DDPG to solve the other optimization variables. In the second algorithm design, we propose an improved DDPG strategy that can simultaneously support the joint processing of discrete and continuous action spaces, where a discrete action is determined by a specific judgment threshold.

The simulation findings support the effectiveness and convergence of the proposed algorithms. These algorithms achieve higher performance compared to the benchmark algorithms, and the communication rate increases with the number of RISs and BSs. Although the MDCA-DDPG works better than the MFD-DDPG, it leads to more frequent BS switching.

The outline of the remaining sections is as follows. Section 2 presents the related work. Section 3 presents the system model. Section 4 proposes two algorithms to address the formulated problem. Section 5 provides numerical results, and conclusions are drawn in Section 6.

2. Related Work

Many scholars have performed research on ISAC. Yu et al. improved radar signal-to-interference-plus-noise ratios (SINRs) and ensured communication quality of service (QoS) by optimizing beamforming and RIS reflection coefficients [24]. The author of [25] proposed two alternating optimization methods for cooperative optimization of target illumination power and communication SINR, demonstrating notable advantages in scenarios with blocked LOS links. Long et al. introduced the Federated Learning-DDPG (FL-DDPG) algorithm to optimize beamforming and phase control to improve positioning accuracy and communication performance [26]. Xia et al. explored predictive beamforming strategies for RIS-assisted V2I systems, proposed two deep learning-based algorithms that leverage sensing information to reduce channel estimation overhead and optimize communication performance, addressing uncertainties in highly dynamic scenarios [27]. The author of [28] proposed a RIS-assisted ISAC system that integrates beam training with target sensing, optimizing beam training and target localization algorithms to improve sensing and communication performance while significantly reducing channel training overhead. These studies have made significant progress in RIS-assisted ISAC system optimization, especially in algorithm design and performance enhancement. However, they lack sufficient focus on the need for MBS collaboration in complex scenarios.

For MBS scenarios, designing a reasonable switching decision is essential [29]. Tang et al. proposed a scheme for mmWave communication blockage prediction and fast BS switching based on the received signal reference power of mobile terminals and the BS transmission beam index, achieving seamless switching through neighborhood beam search and LSTM neural networks [30]. Zhang et al. proposed an enhanced switching scheme based on beamforming for 5G heterogeneous networks, improved signal strength and switching success rates by dynamically adjusting switching parameters [31]. The authors of [32] proposed a multi-agent Q-learning framework for load balancing, user association, and seamless switching in mobile networks, which enhanced network throughput and reduced switching costs. Kose et al. developed an analytical model to predict vehicle dwell time in beam coverage, designed a distributed beam-centric switching algorithm to extend dwell time and reduce switching [33]. For resource allocation in MBS systems, Zhao et al. proposed a rate maximization scheme for multi-RIS-assisted mmWave downlink systems by optimizing RIS, power allocation, and user association [34]. Han et al. addressed BS energy consumption and introduced a hybrid optimization algorithm to maximize user SINR in a heterogeneous MBS network [35]. In MBS sensing positioning, Tong et al. developed a multi-layer factor graph iterative estimation method for environmental sensing, achieving breakthroughs in detecting scattering coefficients and occlusions in wireless cellular networks [36]. Recent studies on MBS ISAC scenarios include a cooperative sensing framework proposed by Wei et al., which overcame a single BS limitations and a symbol-level sensing fusion algorithm for precise target tracking [37], and Zhang et al. utilized Kalman filtering-based resource allocation to balance sensing and communication [38]. These studies investigate switching, resource allocation, and sensing localization issues in MBS scenarios, proposing intelligent algorithms such as LSTM, Q-learning, and Kalman filtering to optimize wireless communication performance and advance related technologies.

Although previous studies have explored MBS systems with ISAC, they have not adequately addressed the challenges in dynamic MBS environments and relay devices. This paper fills this gap by introducing adaptive beamforming and switching strategies. It considers obstacles and introduces a novel reinforcement learning-based beam switching mechanism to enhance performance.

3. System Model

In this paper, we propose a RIS-assisted MBS scenario that includes multiple BSs, multiple RISs, and multiple vehicles, as shown in Figure 1. The frequency bands used by different BSs are orthogonal, eliminating interference between the BSs. Each BS exclusively receives the corresponding RIS signal, and the BSs share information through time synchronization managed by a control center. We directly adopt a number of widely used clock synchronization techniques in this paper [37].

R = {1, 2, \dots, R}

,

B = {1, 2, \dots, B}

,

K = {1, 2, \dots, K}

denote the RIS set, the BS set, and the vehicle set, respectively. Each BS is integrated with an RIS. Considering the deployment heights of the BS and the RIS, the links between the BS and the RIS are not affected by the obstacles. The BS is equipped with a specific number of transmit and receive antennas, which can be expressed as

N_{t 1}, N_{t 2}, \dots, N_{t B}

, and

N_{r 1}, N_{r 2}, \dots, N_{r B}

, and satisfy

N_{t b}, N_{r b} > K,

b = 1, \dots, B

. The number of RIS elements are, respectively, denoted as

M_{1}, M_{2}, \dots, M_{R}

.

3.1. The Association Model Between BSs and Vehicles

The association between BSs and vehicles is represented by the association matrix

A

[34]

A = [a_{1}, \dots, a_{k}, \dots, a_{K}] = [\begin{matrix} \underset{{\tilde{a}}_{1}}{\underset{︸}{a_{1, 1}, a_{1, 2}, \dots, a_{1, K}}} \\ ⋮ \\ \underset{{\tilde{a}}_{B}}{\underset{︸}{a_{B, 1}, a_{B, 2}, \dots, a_{B, K}}} \end{matrix}]

(1)

a_{k} = {[a_{1, k}, \dots, a_{b, k}, \dots, a_{B, k}]}^{T} \in C^{B \times 1}

represents the association vector of the k-th vehicle with all BS, defined as

| | a_{k} | | = 1

, implies that each vehicle is exclusively served by one BS and only one BS.

a_{b, k}

represents the association coefficient between the k-th vehicle and the b-th BS.

a_{b, k} = 1

indicates an association, and

a_{b, k} = 0

indicates no association. For each vehicle, the BS associated with it over a period of time can be represented as

c_{k, T} = [c_{k, 1}, \dots, c_{k, t}, \dots, c_{k, T}]

(2)

c_{k, t}

represents the association within the t-th time slot, the index of the BS associated with the k-th vehicle,

c_{k, t} \in {1, \dots, B}

.

If the association matrix at a given time differs from that of the previous time, it indicates BS switching. Therefore, the number of BS switches for this vehicle during the time period can be denoted as

Q_{k} = \sum_{t = 2}^{T} q_{k, t}

(3)

q_{k, t}

satisfies the following condition:

q_{k, t} = \{\begin{matrix} 1, & c_{k, t} \neq c_{k, t - 1}, \\ 0, & c_{k, t} = c_{k, t - 1} \end{matrix}

(4)

{\tilde{a}}_{b}

and

∥ {\tilde{a}}_{b} ∥

represent the association vector of BS b and the number of vehicles it serves. In this paper, BS b serves the vehicles in the set

A (b)

.

3.2. Communication Model

During each time slot, the BS sends ISAC signals to the served vehicles [33], represented as

x_{b} = \sum_{k \in A (b)} w_{b, k} s_{b, k}

(5)

w_{b, k} \in C^{N_{t b} \times 1}

represents the beamforming transmitted by BS b to the k-th vehicle, and

s_{b, k}

represents the data symbol to be transmitted.

W_{b} = [w_{b, 1}, \dots, w_{b, k}, \dots, w_{b, K_{b}}]

represents the transmission matrix of BS b.

s_{b} = [s_{b, 1}, \dots, \dots, s_{b, k}, \dots s_{b, K_{b}}]

represents the data symbol transmitted by BS b, and it satisfies the constraint

E {s_{b, k} s_{b, k}^{*}} = 1

[39].

The communication signal that the k-th vehicle receives from the serving BS b can be formulated as

y_{k, c} = H_{b, k} w_{b, k} s_{b, k} + H_{b, k} \sum_{j \neq k, j \in A (b)} w_{b, j} s_{b, j} + n_{k, c}

(6)

n_{k, c}

represents the noise signal, which follows the distribution of

n_{k, c} \sim CN (0, σ_{c}^{2})

.

H_{b, k}

refers to the cascaded channel matrix linking the BS and the vehicle, and is represented as

H_{b, k} = ζ_{b, k} h_{b, k}^{H} + h_{r, k}^{H} Θ_{r} {G_{b, r}}^{H}

(7)

ζ_{b, k}

represents the blocking coefficient for the link between the BS and the k-th vehicle, when the channel is not blocked

ζ_{b, k} = 1

, and when it is blocked by obstacles

ζ_{b, k} = 0

.

h_{b, k} \in C^{N_{t b} \times 1}

represents the channel matrix for the BS-vehicle link, can be expressed as

h_{b, k} = \sqrt{α_{b, k}} {[1, e^{- j 2 π d sin (θ_{b, k}) / λ}, \dots, e^{- j 2 π d (N_{t b} - 1) sin (θ_{b, k}) / λ}]}^{T}

(8)

d and

λ

represent the distance between elements and the wavelength of the signal, respectively.

θ_{b, k}

denotes the angle of departure between the serving BS and the k-th vehicle, and

α_{b, k}

is the channel gain from BS b and the k-th vehicle [40]. They can be represented as follows:

α_{b, k} = ζ_{0} {(\frac{d_{b, k}}{d_{0}})}^{- β_{b, k}}

(9)

d_{b, k}

represents the distance from the serving BS b and the k-th vehicle,

β_{b, k}

denotes the path loss factor for the link of serving BS b and the k-th vehicle, and

ζ_{0}

represents the path loss at

d_{0} = 1

m.

h_{r, k} \in C^{M_{r} \times 1}

represents the channel matrix between the serving RIS r and the k-th vehicle, which can be stated by

h_{r, k} = \sqrt{α_{r, k}} {[1, e^{- j 2 π d sin (θ_{r, k}) / λ}, \dots, e^{- j 2 π d (M_{r} - 1) sin (θ_{r, k}) / λ}]}^{T}

(10)

θ_{r, k}

represents the angle of arrival between the serving RIS and the k-th vehicle, and

α_{r, k}

signifies the channel gain of the link between the RIS and the k-th vehicle, which is shown as

α_{r, k} = ζ_{0} {(\frac{d_{r, k}}{d_{0}})}^{- β_{r, k}}

(11)

d_{r, k}

represents the distance from RIS r to the k-th vehicle, and

β_{r, k}

denotes the path loss factor between RIS r and the k-th vehicle.

Θ_{r} \in C^{M_{r} \times M_{r}}

represents the matrix representing the phase shift of the r-th RIS, and is indicated as

Θ_{r} = d i a g (θ_{1, r}, \dots, θ_{m, r}, \dots, θ_{M, r})

(12)

The reflection coefficients that satisfy the unit modulus constraint

| θ_{m, r} | = 1, m = 1, \dots, M

.

G_{b, r} \in C^{N_{t b} \times M_{r}}

represents the channel matrix between BS b and RIS r.

In this paper, this channel is modeled as a Rician channel [39]

G_{b, r} = \sqrt{\frac{κ}{1 + κ}} G_{L O S} + \sqrt{\frac{1}{1 + κ}} G_{N L O S}

(13)

κ

represents the Rician factor.

G_{NLOS}

represents the NLOS component, which is modeled as a circularly symmetric complex Gaussian random variable with zero mean, where both the amplitude and phase are random.

G_{LOS}

represents the LOS component and is stated as

G_{L O S} = a_{r, b} (θ_{r, b}) a_{b, r} {(θ_{b, r})}^{H}

(14)

a_{r, b} (θ_{r, b}) \in C^{M_{r} \times 1}

and

a_{b, r} (θ_{b, r}) \in C^{N_{t b} \times 1}

represent the transmit steering vector from BS b to RIS r and the receive steering vector from RIS r to BS b, respectively. Expressed as

a_{b, r} (θ_{b, r}) = {[1, e^{- j 2 π d sin (θ_{b, r}) / λ}, \dots, e^{- j 2 π d (N_{t b} - 1) sin (θ_{b, r}) / λ}]}^{T}

(15)

a_{r, b} (θ_{r, b}) = {[1, e^{- j 2 π d sin (θ_{r, b}) / λ}, \dots, e^{- j 2 π d (M_{r} - 1) sin (θ_{r, b}) / λ}]}^{T}

(16)

Thus, the SINR at the receiver of the k-th vehicle can be written as

{SINR}_{k} = \frac{{|(ζ_{b, k} h_{b, k}^{H} + h_{r, k}^{H} Θ G_{b, r}^{H}) w_{k, b} s_{k, b}|}^{2}}{\sum_{j \neq k, j \in A (b)} {|(ζ_{b, k} h_{b, k}^{H} + h_{r, k}^{H} Θ G_{b, r}^{H}) w_{j, b} s_{j, b}|}^{2} + σ_{c}^{2}}

(17)

The interference received by vehicle k mainly comes from other vehicles served by the same BS b. Since orthogonal frequency bands are used between BSs, the interference from non-serving BSs to the vehicle is not considered in this formula. The total communication rate of all vehicles in this scenario can be obtained as

E = \sum_{k = 1}^{K} {log}_{2} (1 + {SINR}_{k})

(18)

3.3. Sensing Model

After the BS sends downlink communication signals to the vehicles, it will receive echo signals bounced back from the vehicles. Owing to the substantial path loss associated with multiple reflection paths through the RIS, we only consider the direct reflection path. This assumption is consistent with mmWave IoV scenarios, where high path loss and strong directionality cause most of the received power to concentrate in the LOS and the strongest NLOS component, while other weak reflections are typically 15–20 dB lower and can be neglected. Moreover, the RIS phase matrix is designed to focus the reflected energy toward the dominant path direction, making the equivalent channel effectively single-path dominated. Nevertheless, the proposed optimization framework can be readily extended to multipath channels by introducing independent angle and gain parameters for each path. When the LOS link is not blocked, the signal reflected back to the BS b from the k-th vehicle it serves can be written as [14]

r_{k} = α_{b, k, r} e^{j 2 π μ_{b, k}} b (θ_{b, k}) a {(θ_{b, k})}^{H} W s (t - τ) + n_{r}

(19)

τ = \frac{2 d_{b k}}{c}

represents the time delay of the echo signal reflected from the k-th vehicle to the BS, where

α_{b, k, r}

is the sensing channel fading coefficient from the BS to the k-th vehicle [41], and can be expressed as

α_{b, k, r} = \sqrt{\frac{λ^{2} σ_{RCS}}{64 π^{3} d_{b, k}^{4}}}

(20)

σ_{RCS}

represents the Radar Cross Section (RCS), and

μ_{b, k}

represents the Doppler shift of the echo signal from the k-th vehicle, expressed as

μ_{b, k} = \frac{2 v_{k} f_{c}}{c}

(21)

v_{k}

represents the velocity of the k-th vehicle,

f_{c}

represents the carrier frequency, with c representing the speed of light.

a (θ_{b, k})

and

b (θ_{b, k})

stand for the transmit and receive steering vectors at the BS, respectively, and are expressed as

a (θ_{b, k}) = {[1, e^{- j π sin θ_{b, k}}, \dots, e^{- j π (N_{t b} - 1) sin θ_{b, k}}]}^{T}

(22)

b (θ_{b, k}) = {[1, e^{- j π sin θ_{b, k}}, \dots, e^{- j π (N_{r b} - 1) sin θ_{b, k}}]}^{T}

(23)

n_{r}

represents the echo noise signal, which follows the distribution of

n_{r} \sim CN (0, σ_{r}^{2})

, where

σ_{r}^{2}

represents the noise variance.

3.4. Inter-BS Switching Model

Initialization Phase: The BS establishes a connection with the vehicles by transmitting pilot signals and determines their initial positions using beam scanning technology [42].

ISAC Signal Transmission: After determining the position of the vehicles, the BS transmits ISAC signals to them. This technology enables simultaneous downlink communication and sensing, allowing real-time acquisition of the motion state of the vehicles and Channel State Information (CSI). The echo signal is shown in Equation (19).

BS Switching Decision: The BS obtains vehicle motion status and position prediction information by analyzing the echo signals, and evaluates whether an inter-BS switching decision is necessary. If the result indicates that a switch is required, the current serving BS transmits a signal to notify the next BS that it is ready to communicate with the vehicle. The switching decision process of the BS can be represented as

p, p \in {0, 1} = h (v, θ, ζ, E)

(24)

p represents the switching decision index, where

p = 0

indicates no switching and

p = 1

indicates an inter-BS switching. On the right side of the expression, v represents the current travel direction of the vehicle,

θ

represents the angle between the BS and the vehicle.

ζ

represents the channel blockage coefficient, E represents the communication rate, and

h (\cdot)

is a decision function that comprehensively considers multiple factors to determine whether to perform the BS switching.

Although orthogonal frequency bands are allocated to different base stations (BSs) to eliminate inter-cell interference, the multi-BS (MBS) scenario still differs essentially from the single-BS (SBS) case. Specifically, the MBS architecture provides multiple candidate links for each vehicle, enabling dynamic BS switching when the current link experiences blockage or degradation, thereby ensuring continuous and reliable communication. Moreover, MBS systems allow cooperative beamforming and RIS control among BSs, extending the coverage area and improving sensing accuracy. In addition, the handover frequency and decision strategy under high-mobility conditions represent a dynamic optimization challenge that does not exist in the SBS case. Therefore, the focus of this work is on cooperative sensing and dynamic switching among multiple BSs rather than interference mitigation through spectrum reuse.

3.5. Problem Formulation

We define a problem that aims to maximize the communication rate while minimizing the BS sensing error and the number of BS switches. Where

β_{c}

,

β_{r}

, and

β_{q}

represent the weighting values for communication, sensing, and the number of switches, respectively. This objective is set to balance communication performance, sensing accuracy, and the number of BS switching, ensuring that the IoV system can accurately sense the state of the vehicles and minimize switching while transmitting data. We have

\begin{matrix} max_{W, Θ} & β_{c} \sum_{k = 1}^{K} {log}_{2} (1 + {SINR}_{k}) - β_{r} \sum_{k = 1}^{K} |{\tilde{θ}}_{k} - θ_{k}| - β_{q} \sum_{k = 1}^{K} Q_{k} \\ s . t . & C 1 : {SIN R}_{k} \geq η_{c} k = 1, 2, \dots, K, \\ C 2 : |{\tilde{θ}}_{k} - θ_{k}| \leq η_{r} k = 1, 2, \dots, K, \\ C 3 : ζ_{b, k} \in {0, 1} k = 1, 2, \dots, K, \\ C 4 : | θ_{m, r} | = 1 m = 1, 2, \dots, M, r = 1, 2, \dots, R, \\ C 5 : \sum_{b = 1}^{B} {∥ W_{b} ∥}_{F}^{2} \leq P_{m a x}, \\ C 6 : ∥ a_{k} ∥ = 1 k = 1, 2, \dots, K, \\ C 7 : ∥ {\tilde{a}}_{b} ∥ \geq 0 b = 1, 2, \dots, B . \end{matrix}

(25)

C1 ensures that the communication rate of each vehicle meets the minimum communication threshold

η_{c}

. C2 requires that the sensed angle error of the vehicle relative to the BS stays within the maximum threshold

η_{r}

. Specifically, the sensing error is formally defined as the Absolute Angular Error between the estimated and true angles. Note that this error is minimized independently via the LSTM loss function rather than the DDPG reward. C3 ensures that the link status switches only between blockage and connectivity. C4 constrains the phase shift matrix of the RIS, ensuring that its absolute value is always 1. C5 limits the total transmission power of the BS, ensuring it does not exceed the maximum threshold

P_{\max}

. C6 indicates that each vehicle communicates with only one BS at any given time. C7 ensures that the number of vehicles served by each BS is non-negative.

4. Proposed Algorithm

We model the problem as an MDP, which primarily consists of the following key elements: an agent, a set of environmental states, a reward function, and a set of actions. This paper proposes two improved DDPG-based algorithms to solve this problem. The MFD-DDPG focuses on reducing switching frequency by considering signal strength and blockage conditions, while the MDCA-DDPG integrates both discrete and continuous actions to achieve optimal beamforming. In the V2X, CSI exhibits significant time-varying characteristics due to the high-speed mobility of vehicles and the complex propagation environment. In order to improve the reliability of the communication system, this paper adopts the LSTM network to construct a CSI prediction model, which realizes the accurate estimation of the dynamic channel by capturing the time-dependent characteristics of the channel parameters, so as to provide the system with accurate CSI assistance.

4.1. LSTM-Based Prediction Algorithm

LSTM is an improved recursive network that bridges historical information and current tasks. In this research, vehicles are constantly moving, and the channel state changes dynamically with their movement. The historical information is defined as a time-dependent dynamic sequence when predicting the position information for the next time. This sequence is the input of the LSTM along the forward chain structure and can be interpreted as

D (t) = [p (t - T s), p (t - T s + 1), \dots, p (t - 1)]

(26)

T_{s}

represents the duration of the time series—the time span considered for each dynamic sequence.

p (t) = [p_{1} (t), \dots, p_{k} (t), \dots, p_{K} (t)]

represents the set of all vehicle states in the IoV at time t,

p_{k} (t)

denotes the input information of the LSTM network for the k-th vehicle, which mainly includes the following:

p_{k} (t) = [θ_{b k}, θ_{r k}, ζ_{b k}, ζ_{r k}, r_{k}]

(27)

We use the traditional LSTM network structure, which consists of a chain of repeating modules. Each module contains several major interacting components that work together to enable the network to preserve important information while processing long-term sequential data. The first layer is the forget gate

f (t)

, which primarily determines which parts of the previous cell state

C (t - 1)

will be maintained in the current state

C (t)

. It incorporates the previous hidden state

H (t - 1)

along with the current input

D (t)

, and calculates through the weights

W^{f}

,

{\hat{W}}^{f}

, and

B^{f}

as

f (t) = σ (W^{f} D (t) + {\hat{W}}^{f} H (t - 1) + B^{f})

(28)

f represents the weights and biases of the forget gate, and

σ

denotes the gate activation function, which is typically a this sequence is composed of the vehicle states from time

t - T_{s}

, the Sigmoid function. The forget gate helps the LSTM maintain historical information.

The input gate regulates the flow of the current input

D (t)

, ensuring that only relevant content is updated into the LSTM state

C (t)

. The Sigmoid function is used to implement the input gate, while the new candidate state

d (t)

is generated using the tanh function. The expressions are as [43]

i (t) = σ (W^{i} D (t) + {\hat{W}}^{i} H (t - 1) + B^{i})

(29)

d (t) = \tanh (W^{d} D (t) + {\hat{W}}^{d} H (t - 1) + B^{d})

(30)

i and d represent the weights and biases associated with the input gate and the candidate memory cell state.

The output gate determines the portion of the cell state

C (t)

that contributes to the output. The formula is given by

o (t) = σ (W^{o} D (t) + {\hat{W}}^{o} H (t - 1) + B^{o})

(31)

H (t) = o (t) \otimes \tanh (C (t))

(32)

o represents the weights and biases of the output gate. Therefore, the update for the cell state

C (t)

at time t is given by

C (t) = f (t) \otimes C (t - 1) + i (t) \otimes d (t)

(33)

Based on the control mechanisms of the gates mentioned above, LSTM can effectively predict the next vehicle position by leveraging real-time communication information and previous state estimates. The estimated information

{\hat{x}}_{t}

for the next time step, can be given by

{\hat{x}}_{t} = j (H_{t})

(34)

j represents the mapping composed of a fully connected layer and an activation function. The estimated output of the LSTM at time t is

{\hat{x}}_{t} = [{\hat{x}}_{t, 1}, \dots, {\hat{x}}_{t, K}]

,

{\hat{x}}_{t, k} = [{\hat{θ}}_{b k}, {\hat{θ}}_{r k}]

. Additionally, the observed real information at this time is depicted as

x_{t} = [x_{t, 1}, \dots, x_{t, K}]

,

x_{t, k} = [θ_{b k}, θ_{r k}]

.

Given the discrete nature of the blocking coefficient, the model adopts the Sigmoid activation function. With 0.5 set as the threshold for judgment, if the output of the Sigmoid function is greater than 0.5, it is considered that the channel is in a non-blocking state, and the blocking coefficient is set to 1. Conversely, the blocking coefficient is accordingly set to 0. In summary, the loss function needs to consider both the accuracy of position prediction and the estimation of the blocking state. This paper adopts a weighted summation method to balance the error representation of these two aspects, which can be shown as

\begin{matrix} Loss (θ_{l}) = & \sum_{k = 1}^{K} (α_{1} (|{\hat{θ}}_{b k} - θ_{b k}| + |{\hat{θ}}_{r k} - θ_{r k}|) \\ + α_{2} {(|{\hat{ζ}}_{b, k} - ζ_{b, k}| + |{\hat{ζ}}_{r, k} - ζ_{r, k}|)}^{2}) \end{matrix}

(35)

θ_{l}

represents the parameter set of the LSTM network;

α_{1}

and

α_{2}

are the weight coefficients used to adjust the importance of different types of errors. Algorithm 1 presents a summary of the training and application phases of the LSTM-based prediction algorithm. The proposed system adopts a “periodic sensing + continuous tracking” strategy. Initially, a full sensing stage is performed through beam scanning to estimate the vehicle angle and blockage state. Afterwards, an LSTM-based tracker predicts these parameters in subsequent time slots using historical observations, avoiding the need for sensing in every slot. When the prediction uncertainty exceeds a predefined threshold or the confidence score drops below a limit, a new sensing phase is triggered to recalibrate the estimated CSI. Hence, sensing is adaptively executed rather than slot-by-slot, significantly reducing pilot overhead while maintaining reliable tracking accuracy.

Algorithm 1 LSTM-Based Prediction Algorithm

1:: procedure TrainingProcess
2:: Input: Dynamic CSI
3:: Output: Optimal network parameter $θ_{l}^{*}$
4:: Initialization: Network parameter $θ_{l}$
5:: for each training iteration do
6:: Sample the observed training data and input it into the network
7:: Obtain the final layer output $H_{t}$ of the LSTM network and compute the estimated output ${\hat{x}}_{t}$
8:: Calculate the loss function $Loss (θ_{l})$
9:: Minimize the loss function to update the network parameters $θ_{l}$
10:: end for
11:: return trained model
12:: end procedure
13:: procedure ApplicationProcess
14:: Input: CSI and vehicle parameters from the previous 10 time slots
15:: Output: CSI and vehicle parameters for the next time slot
16:: for each test sample do
17:: Input the CSI from the previous 10 time slots into the LSTM network
18:: The LSTM network outputs the estimated value ${\hat{x}}_{t}$
19:: end for
20:: return application results
21:: end procedure

4.2. Multi-Factor Decision DDPG Algorithm (MFD-DDPG)

The algorithm architecture shown in Figure 2 consists of two parts: BS switching decision and DDPG-based beamforming optimization. First, the decision of whether to switch the BS or not is made by considering the vehicle moving direction, the channel blocking condition, and the received signal strength in the IoV, based on which the beamforming and phase shift matrix design are optimized using DDPG.

The BS needs to be analyzed at each point in time to determine whether a service vehicle requires BS switching. At time t, the communication signal received by vehicle k from the serving BS b is indicated as

y_{b, k, t} = H_{b, k, t} w_{b, k, t} s_{b, k, t} + H_{b, k, t} \sum_{j \neq k, j \in A (b)} w_{b, j, t} s_{b, j, t} + n_{k, c, t}

(36)

The signal strength and communication rate of BS b for vehicle k can be calculated as

P_{b, k, t} = {| y_{b, k, t} |}^{2}

(37)

E_{b, k, t} = \sum_{k = 1}^{K} {log}_{2} (1 + \frac{| H_{b, k, t} w_{b, k, t} s_{b, k, t} |^{2}}{\sum_{j \neq k, j \in A (b)} {| H_{b, k, t} w_{b, j, t} s_{b, j, t} |}^{2} + σ_{c}^{2}})

(38)

In addition to signal strength, this paper introduces the blockage coefficient

ζ_{b, k, t}

between the BS and the k-th vehicle, as well as the travel direction coefficient

v_{b, k, t}

of the k-th vehicle, as key considerations.

The blockage coefficient is used to describe the blockage condition of the BS-vehicle link. A blockage coefficient of 1 indicates that the link is not blocked, while a coefficient of 0 indicates that the link is blocked. The direction coefficient

v_{b, k, t}

equals 1 indicates that at time t, the k-th vehicle is moving towards BS b. Conversely, the direction coefficient

v_{b, k, t}

equals −1 indicates that the k-th vehicle is moving away from BS b.

Through this multi-factor integrated decision-making approach, it is possible to more accurately formulate a BS switching algorithm that meets the demands of the IoV. Consequently, this improves the overall communication effectiveness and dependability. Therefore, incorporating this BS switching decision algorithm into the beamforming design process allows the DDPG network to better learn strategies as the vehicle-BS association matrix changes, ultimately maximizing the total performance of the communication system. The complete algorithm steps are presented in Algorithm 2.

Algorithm 2 Training procedure of the MFD-DDPG algorithm

1:: Input: LSTM-optimized policy, network configuration, and minimum switching threshold $η_{\min}$ .
2:: Output: Actor and Critic network parameters $θ_{μ}^{*}$ , $θ_{Q}^{*}$ , and the optimal switching policy.
3:: Initialization: Clear the experience replay buffer and initialize the network parameters $θ_{μ}$ and $θ_{Q}$ .
4:: for each episode do
5:: Reset the relationship state among the RIS-assisted IoV system, the vehicle, and the BS.
6:: for each step do
7:: Use the LSTM network to predict the channel state and vehicle state at the next time step.
8:: Obtain the current state $s (t)$ and action $a (t)$ , add exploration noise, calculate the reward $r (t)$ , and transition to the next state $s (t + 1)$ .
9:: Store $[s (t), a (t), r (t), s (t + 1)]$ in the experience replay buffer $D$ .
10:: Determine a potential switching BS with the highest reward according to the current state information, and obtain the details of the current BS and the vehicle.
11:: Use (37) to calculate the signal strength $P_{b, k, t}$ and estimate $P_{q, k, t}$ for the vehicle at the current time step.
12:: Initialize the switching state as $ς_{p} = 0$ .
13:: if $P_{b, k, t} \leq P_{q, k, t}$ then
14:: if $ζ_{b, k, t} = 1$ then
15:: if $v_{b, k, t} = - 1$ and $v_{q, k, t} = 1$ then
16:: Perform base station handover, $p = 1$ ;
17:: end if
18:: else
19:: if $v_{b, k, t} = - 1$ and $v_{q, k, t} = 1$ then
20:: Perform base station handover, $p = 1$ ;
21:: else if $E_{b, k, t} < η_{min}$ then
22:: Perform base station handover, $p = 1$ ;
23:: end if
24:: end if
25:: end if
26:: Update the Critic and Actor networks, and softly update the target networks.
27:: end for
28:: end for

4.3. Mixed Discrete and Continuous Action DDPG (MDCA-DDPG)

The above algorithm focuses on comprehensively considering multiple factors in each training step to decide whether BS switching is necessary. This section presents an improved algorithm called MDCA-DDPG, as shown in Figure 3, which generates discrete and continuous actions simultaneously. DDPG is a policy-based method that can learn an optimal policy in a continuous action space. In complex IoV scenarios, the decisions faced include not only the continuous optimization but also discrete decisions such as whether to execute BS switching. The core of this algorithm lies in integrating discrete action into the action decision process. This enables a more flexible switching decision and beamforming design process.

The state space in this paper includes all channel matrices, received communication signals, sensing echo signals, the reward value from the preceding time step, and the prediction outputs of the LSTM network. It can be characterized as

U_{b} (t) = [u_{1_{b}} (t), \dots, u_{k_{b}} (t), \dots, u_{K_{b}} (t)]

(39)

U_{b} (t)

represents information related to BS b, covering the channel matrices between each serviced vehicle and the BS, the matrices between the vehicles and the serving RIS, the communication signals received by the vehicles, and the echo signals detected at the BS. It can be represented as

u_{k} (t) = [h_{b, k}, h_{r, k}, y_{k, c}, r_{k}, {\hat{x}}_{k} (t)]

(40)

s (t) = [U_{1} (t), \dots, U_{b} (t), \dots, U_{B} (t), r e w a r d (t - 1)]

(41)

Based on the aforementioned state information, the Actor network provides action outputs. Specifically, in the algorithm implementation, the MDCA-DDPG processes the decision-making for both discrete and continuous actions through the Actor network. The discrete action component is determined by the output of the Actor network using a specific threshold. This threshold is used to map the output of the network into a discrete action space, the BS switching decision. Since the output values are normalized by the tanh activation function during the implementation of the Actor network, this mapping can be performed based on the sign of the output values, with a discrete action set to 1 (base station switching) when the output is greater than 0, and a discrete action set to 0 (no switching) when the output is less than 0.

For continuous action, the Actor network outputs a value directly, which is also normalized by the tanh activation function to ensure that the output action is within the appropriate range. These continuous action values are then used to adjust continuous control variables such as beamforming parameters. This hybrid action output mechanism allows MDCA-DDPG to flexibly handle complex problems of BS switching and communication optimization in IoV.

The Actor network produces an action vector

a (t)

as its output that includes both discrete action

a_{d i s} (t)

and continuous action

a_{c o n} (t)

. Therefore, the output model of the Actor network can be represented as

a (t) = π (s (t) | θ^{π}) = [a_{d i s} (t), a_{c o n} (t)]

(42)

s (t)

represents the ongoing condition of the environment, and

θ^{π}

represents the parameters of the Actor network policy. The discrete action

a_{dis} (t)

is output by the Actor network and then undergoes a threshold decision to determine whether the vehicle should perform a BS switching. It is represented as

a_{d i s} = [p_{1}, \dots, p_{k}, \dots, p_{K}]

(43)

The continuous action

a_{con} (t)

is directly output by the Actor network, representing the beamforming associated with all BSs, as well as the RIS matrix for phase shift. It is depicted as

a_{c o n} (t) = [vec (W_{1}), \dots, vec (W_{B}), diag (Θ_{1}), \dots, diag (Θ_{R})]

(44)

The Critic network is used to assess the current policy, providing the expected return given the

s (t)

,

a_{d i s} (t)

, and

a_{c o n} (t)

. The input of the Critic network also needs to consider both discrete and continuous actions, and can be represented as

Q (s (t), a_{dis} (t), a_{con} (t) | θ^{Q})

, where

θ^{Q}

represents the Critic network parameters.

In this study, it is necessary to achieve better communication and sensing performance while minimizing the number of switches. Therefore, the reward function must consider these factors, as well as the power limit and the lowest communication threshold, to meet the hardware limit of the BS and ensure the stability of communication. The reward function is set as

r (t) = β_{1} \sum_{k = 1}^{K} {\tilde{E}}_{k} - β_{2} \sum_{k = 1}^{K} {\tilde{r}}_{s n r, k} - β_{3} {\tilde{r}}_{p o w e r} - β_{4} \sum_{k = 1}^{K} {\tilde{Q}}_{k}

(45)

The reward function comprehensively considers the performance evaluation of both discrete and continuous actions. The first three terms represent continuous action, while the fourth term represents discrete action.

β_{1}

,

β_{2}

,

β_{3}

, and

β_{4}

represent the weight distribution of each part of the reward.

r_{snr, k}

reflects the communication penalty. If the communication rate of a vehicle falls below the threshold, the reward function imposes a penalty as

r_{s n r, k} = \{\begin{matrix} 0, & E_{k} \geq η_{c}, \\ η_{c} - E_{k}, & E_{k} < η_{c} \end{matrix}

(46)

The limitation on the transmission power is also constrained by the reward function to ensure that the total transmission power does not surpass the maximum allowable threshold, which can be expressed as

r_{p o w e r} = \{\begin{matrix} 0, & \sum_{b = 1}^{B} | | W_{b} {| |}_{F}^{2} \leq P_{m a x}, \\ \sum_{b = 1}^{B} | | W_{b} {| |}_{F}^{2} - P_{m a x}, & \sum_{b = 1}^{B} | | W_{b} {| |}_{F}^{2} > P_{m a x} \end{matrix}

(47)

Q_{k}

represents the total number of current BS switching. The purpose is to reduce the number of switches as much as possible. In this way, the aim is to enhance communication performance while minimizing the number of switches and ensuring that the power remains within an acceptable range, thereby maintaining the stability of the communication connection. A penalty is imposed whenever a switch occurs. The precise steps of training the MDCA-DDPG are provided in Algorithm 3.

To avoid optimization bias caused by differences in the magnitude of various physical quantities, we normalized each sub-objective in our implementation, expressed as

{\tilde{E}}_{k}

,

{\tilde{r}}_{s n r, k}

,

{\tilde{r}}_{p o w e r}

and

{\tilde{Q}}_{k}

.

Algorithm 3 Training Procedure of the MDCA-DDPG Algorithm

1:: Input: LSTM-optimized output strategy and network parameters.
2:: Output: Optimal Actor and Critic network parameters $θ_{μ}^{*}$ and $θ_{Q}^{*}$ , and the optimal switching decision.
3:: Initialization: Clear the experience replay buffer and initialize the network parameters $θ_{μ}$ and $θ_{Q}$ .
4:: for each episode do
5:: Reset the RIS-assisted MBS-IoV system and randomly initialize the association state between vehicles and BSs.
6:: for each step do
7:: Use the fully trained LSTM network to predict the channel state and vehicle state at the next moment.
8:: Observe the current state $s (t)$ , feed it into the Actor network, and identify the candidate BS q closest to the vehicle.
9:: The Actor network outputs the hybrid action $(a_{dis} (t), a_{con} (t))$ , where $a_{dis} (t)$ determines whether to switch from the current BS b to the candidate BS q, and $a_{con} (t)$ is used for beamforming design. Exploration noise is added to $a_{con} (t)$ .
10:: Based on the switching decision and hybrid beamforming design at the BS, the agent calculates the current reward by jointly considering the performance of the discrete and continuous actions.
11:: After executing the action $(a_{dis} (t), a_{con} (t))$ , the agent transitions to the next state $s (t + 1)$ and stores $[s (t), a (t), r (t), s (t + 1)]$ in the experience replay buffer $D$ .
12:: Sample mini-batches from $D$ , update the Critic and Actor networks by minimizing the loss function, and softly update the target networks.
13:: end for
14:: end for

5. Numerical Results

The road center is defined as the coordinate origin (0 m, 0 m) to establish a coordinate system. It is assumed that there are K vehicles traveling along the x-axis, with their initial positions randomly determined and represented as

V_{1}

to

V_{K}

. The noise level is set to −80 dBm. The parameters used in the simulation are shown in Table 1.

First, we explore the consequences of different BS numbers on a single-vehicle scenario. Describe the single-vehicle scenario as SU, the multi-vehicle scenario as MU, the single-base station as SB, and the multi-base station as MB. The power constraint is set to 27 dBm. Each BS has 32 antennas, while each RIS contains 96 elements. The vehicle is initially positioned at (−100 m, 0 m), with BSs located at (−50 m, −50 m) and (50 m, −50 m). The RIS positions are set at (−50 m, 50 m) and (50 m, 50 m). Figure 4 shows that the communication rate in the MB scenario is significantly higher than that in the SB scenario due to the larger coverage area. In the MB, the performance achieved by the proposed MFD-DDPG and MDCA-DDPG is similar, and as the iterations progress, the communication rate gradually increases and tends to stabilize. Because MFD-DDPG primarily determines BS switching based on signal strength, it makes a switching decision once the signal strength of a candidate BS is greater. On the other hand, MDCA-DDPG employs a hybrid action approach to enable the BS to make switching decisions while jointly optimizing beamforming strategies based on the overall environment.

We use the cumulative distribution function (CDF) to represent the communication performance after convergence. As shown in Figure 5, in the MB scenario, the performance of both algorithms is nearly the same, and both perform better than in the SB scenario, especially in the higher communication rate region. This demonstrates that the performance in the MB scenario is significantly better than in the SB scenario.

Secondly, the relationship between the number of BSs in an MB system and its effect is shown in Figure 6. It is assumed that there are three vehicles with initial positions at (−100 m, 0 m), (−50 m, 0 m), and (0 m, 0 m), respectively. The number of BS antennas and RIS elements is both set to 8, and the power constraint for a single vehicle is 27.8 dBm. Figure 6 shows that the performance in MB scenarios is significantly better than in SB scenarios. However, the performance of the MFD-DDPG is far inferior to that of the MDCA-DDPG. In BS switching, it is not enough to consider factors such as signal strength. Interference between vehicles is also an important factor. The MDCA-DDPG meets the needs of such scenarios but also leads to more switching. According to the results after convergence, the MFD-DDPG requires only 3 switches, while the MDCA-DDPG results in as many as 25 switches after training convergence. Therefore, each of these algorithms has its own advantages. To analyze the effect of multiple BSs compared to a single BS, the communication rates of all vehicles are represented using the CDF. Each curve represents the combined communication rate distribution of all vehicles, as shown in Figure 7.

Thirdly, we investigate the effect of RIS on system performance in a dual-BS scenario, temporarily disregarding vehicle interference and link blockage. The BSs are configured with 32 antennas, and the transmission power is set to 27 dBm. The starting point of the vehicle is (−100 m, 0 m), with the BSs located at (−50 m, −50 m) and (50 m, −50 m), and the corresponding RIS positions at (−50 m, 50 m) and (50 m, 50 m). A thorough analysis is performed on the impact of having 0, 32, and 96 RIS elements on the system reward value and communication rate. As shown in Figure 8, the system reward gradually increases and eventually stabilizes with iterations. From Figure 9, it can be observed that as the number of RIS elements increases, the system communication rate shows an upward trend. Without RIS, the communication rates of the two algorithms are approximately 5.2 Kbps/Hz. However, with the introduction of RIS, even with 32 RIS elements, the system communication rate can reach 6.9 Kbps/Hz, representing an improvement of about 32.6%. With 96 RIS elements, the communication rate can reach as high as 8.0 Kbps/Hz, which is an improvement of over 50% compared to the scenario without RIS.

Fourth, to consider the practicality of the BS switching algorithms discussed in this paper, the proposed algorithm is compared with two other common algorithms. ‘Distance’ refers to the distance-based algorithm [44], which performs a switch when the candidate BS is closer to the vehicle than the serving BS. ‘Block’ refers to the blockage-based switching algorithm, which performs a BS switching when the BS detects that the channel to the vehicle is interrupted. Based on these two comparison algorithms, the paper conducts a comparison of BS switching algorithms in both single-user and multi-user scenarios.

The system configuration includes 32 BS antennas, 96 RIS elements, and a transmission power of 27 dBm. The starting point of the vehicle is (−100 m, 0 m). The BSs are placed at (−50 m, −50 m) and (50 m, −50 m), with the corresponding RIS positions at (−50 m, 50 m) and (50 m, 50 m). Additionally, the paper introduces the interference of obstacles in this scenario by placing a stationary large truck with a length of 5 m at the road position (−20 m, −3 m) as an obstacle. Under these parameter settings, the paper compares the communication rate variations in these algorithms with changes in noise levels, as outlined in Figure 10.

From the analysis, it is clear that the proposed algorithms improve communication performance by 6.25% and 5% compared to the block and distance algorithms, respectively. In this scenario, RIS is introduced to allow link reconstruction even when there is blockage. Switching to another, more distant BS due to blockage can actually degrade system performance, making the capabilities of the block-based algorithm the poorest among these options. The distance-based switching algorithm, while considering the impact of distance on communication, neglects factors such as blockage, making the resulting switching decision suboptimal as well.

Next, simulations are conducted for a multi-vehicle scenario. The placements of the three vehicles are (−100 m, 0 m), (−50 m, 0 m), and (0 m, 0 m). The placements of the BSs are (−100 m, −50 m) and (100 m, −50 m), with the corresponding RIS positions at (−100 m, 50 m) and (100 m, 50 m). As seen in Figure 11, the MDCA-DDPG outperforms the two benchmark algorithms by 42.2% and 39.6%, respectively. This is because the MDCA-DDPG incorporates the switching decision into the actions of the DDPG network, allowing the agent to comprehensively consider factors such as interference between multiple vehicles to make more optimal decisions. In contrast, the other three algorithms make decisions solely based on the environmental state at a given moment. Additionally, the results demonstrate that the proposed MFD-DDPG excels compared to the distance and block algorithms. This is because, unlike the other two algorithms, the MFD-DDPG takes into account more factors, such as signal strength, channel blockage, and vehicle travel direction. Specifically, the frequent handovers in MDCA-DDPG increase signaling overhead and risk network stability, making it preferable for high-throughput multimedia applications. Conversely, MFD-DDPG ensures stable connectivity with minimal overhead, making it ideal for mission-critical IoV services like autonomous driving.

6. Conclusions

In this work, we analyze an RIS-assisted MBS scenario that includes obstacle and multi-vehicle interference. We propose an optimization problem that jointly optimizes the RIS phase shift and beamforming to maximize communication rate, minimize sensing error, and reduce switching frequency. This paper introduces two novel algorithms, MDCA-DDPG and MFD-DDPG, to solve this problem. Simulation results confirm the efficacy of the proposed algorithms. The RIS-assisted MBS system proposed in this paper demonstrates superior performance in enhancing communication efficiency and reducing the number of switches. The proposed MBS–RIS cooperative framework enhances link reliability and sensing accuracy in dynamic vehicular environments, demonstrating the advantages of MBS systems even under orthogonal frequency allocation. Future research will focus on the deployment of distributed RIS management to further reduce computational complexity and improve scalability. Additionally, we will investigate the integration of V2X communication standards with the proposed system to enhance the feasibility of practical applications. In terms of feasibility, the trained MDCA-DDPG and MFD-DDPG models can be deployed on edge servers for millisecond-level decision-making, while the passive RIS hardware can be easily integrated into urban infrastructure with minimal overhead. Although current simplifying assumptions limit the generality of the results, future work will extend this framework to realistic IoV scenarios, considering multi-path effects and dynamic vehicle mobility.

Author Contributions

Conceptualization, J.L., Y.Z. and D.W.; methodology, J.L., Y.Z. and D.W.; software, J.L., Y.Z. and D.W.; validation, Y.Z.; formal analysis, J.L., Y.Z. and D.W.; writing—original draft preparation, J.L., Y.Z. and D.W.; writing—review and editing, J.L., Y.Z. and D.W.; visualization, J.L.; supervision, Y.Z.; project administration, D.W.; funding acquisition, D.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of Xiamen, China (Grant number 3502Z20227177) and the National Natural Science Foundation of China (Grant numbers 62271427 and 62171392).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xi, N.; Li, W.; Jing, L.; Ma, J. ZAMA: A ZKP-Based Anonymous Mutual Authentication Scheme for the IoV. IEEE Internet Things J. 2022, 9, 22903–22913. [Google Scholar] [CrossRef]
Nwakanma, C.I.; Ahakonye, L.A.C.; Njoku, J.N.; Odirichukwu, J.C.; Okolie, S.A.; Uzondu, C.C.; Nweke, N.; Kim, D.-S. Explainable Artificial Intelligence (XAI) for Intrusion Detection and Mitigation in Intelligent Connected Vehicles: A Review. Appl. Sci. 2023, 13, 1252. [Google Scholar] [CrossRef]
Hou, X.; Ren, Z.; Wang, J.; Cheng, W.; Ren, Y.; Chen, K.C.; Zhang, H. Reliable Computation Offloading for Edge-Computing-Enabled Software-Defined IoV. IEEE Internet Things J. 2020, 7, 7097–7111. [Google Scholar] [CrossRef]
Abboud, K.; Omar, H.A.; Zhuang, W. Interworking of DSRC and Cellular Network Technologies for V2X Communications: A Survey. IEEE Trans. Veh. Technol. 2016, 65, 9457–9470. [Google Scholar] [CrossRef]
Noor-A-Rahim, M.; Liu, Z.; Lee, H.; Khyam, M.O.; He, J.; Pesch, D.; Moessner, K.; Saad, W.; Poor, H.V. 6G for Vehicle-to-Everything (V2X) Communications: Enabling Technologies, Challenges, and Opportunities. Proc. IEEE 2022, 110, 712–734. [Google Scholar] [CrossRef]
Duan, W.; Gu, J.; Wen, M.; Zhang, G.; Ji, Y.; Mumtaz, S. Emerging Technologies for 5G-IoV Networks: Applications, Trends and Opportunities. IEEE Netw. 2020, 34, 283–289. [Google Scholar] [CrossRef]
Liu, A.; Huang, Z.; Li, M.; Wan, Y.; Li, W.; Han, T.X.; Liu, C.; Du, R.; Tan, D.K.P.; Lu, J.; et al. A Survey on Fundamental Limits of Integrated Sensing and Communication. IEEE Commun. Surv. Tutor. 2022, 24, 994–1034. [Google Scholar] [CrossRef]
Liu, F.; Masouros, C.; Petropulu, A.P.; Griffiths, H.; Hanzo, L. Joint radar and communication design: Applications, state-of-the-art, and the road ahead. IEEE Trans. Commun. 2020, 68, 3834–3862. [Google Scholar] [CrossRef]
Zhang, D.; Li, A.; Shirvanimoghaddam, M.; Cheng, P.; Li, Y.; Vucetic, B. Codebook-based training beam sequence design for millimeter-wave tracking systems. IEEE Trans. Wirel. Commun. 2019, 18, 5333–5349. [Google Scholar] [CrossRef]
Va, V.; Vikalo, H.; Heath, R.W. Beam tracking for mobile millimeter wave communication systems. In Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Washington, DC, USA, 7–9 December 2016; pp. 743–747. [Google Scholar]
Zhang, Z.; Dai, L.; Chen, X.; Liu, C.; Yang, F.; Schober, R.; Poor, H.V. Active RIS vs. Passive RIS: Which Will Prevail in 6G? IEEE Trans. Commun. 2023, 71, 1707–1725. [Google Scholar] [CrossRef]
Naaz, F.; Nauman, A.; Khurshaid, T.; Kim, S.-W. Empowering the vehicular network with RIS technology: A state-of-the-art review. Sensors 2024, 24, 337. [Google Scholar] [CrossRef]
Du, Z.; Liu, F.; Yuan, W.; Masouros, C.; Zhang, Z.; Xia, S.; Caire, G. Integrated sensing and communications for V2I networks: Dynamic predictive beamforming for extended vehicle targets. IEEE Trans. Wirel. Commun. 2022, 22, 3612–3627. [Google Scholar] [CrossRef]
Feng, J.; Zhang, P.; Huang, L.; Qian, G. Reconfigurable Intelligent Surface Aided DFRC Vehicular Networks. In Proceedings of the 2023 6th World Conference on Computing and Communication Technologies (WCCCT), Chengdu, China, 6–8 January 2023; pp. 1–6. [Google Scholar]
Yan, S.; Cai, S.; Xia, W.; Zhang, J.; Xia, S. A reconfigurable intelligent surface aided dual-function radar and communication system. In Proceedings of the 2022 2nd IEEE International Symposium on Joint Communications & Sensing (JC&S), Seefeld, Austria, 9–10 March 2022; pp. 1–6. [Google Scholar]
He, Y.; Cai, Y.; Mao, H.; Yu, G. RIS-assisted communication radar coexistence: Joint beamforming design and analysis. IEEE J. Sel. Areas Commun. 2022, 40, 2131–2145. [Google Scholar] [CrossRef]
Wei, Z.; Xu, R.; Feng, Z.; Wu, H.; Zhang, N.; Jiang, W.; Yang, X. Symbol-Level Integrated Sensing and Communication Enabled Multiple Base Stations Cooperative Sensing. IEEE Trans. Veh. Technol. 2023, 73, 724–738. [Google Scholar] [CrossRef]
Lu, X.; Wei, Z.; Xu, R.; Wang, L.; Lu, B.; Piao, J. Integrated Sensing and Communication Enabled Multiple Base Stations Cooperative UAV Detection. In Proceedings of the 2024 IEEE International Conference on Communications Workshops (ICC Workshops), Denver, CO, USA, 9–13 June 2024; pp. 1882–1887. [Google Scholar]
Xiao, M.; Mumtaz, S.; Huang, Y.; Dai, L.; Li, Y.; Matthaiou, M.; Karagiannidis, G.K.; Björnson, E.; Yang, K.; I, C.-L. Millimeter wave communications for future mobile networks. IEEE J. Sel. Areas Commun. 2017, 35, 1909–1935. [Google Scholar] [CrossRef]
Luo, H.; Liu, R.; Li, M.; Liu, Y.; Liu, Q. Joint Beamforming Design for RIS-Assisted Integrated Sensing and Communication Systems. IEEE Trans. Veh. Technol. 2022, 71, 13393–13397. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Y.; Wu, Q.; Li, X.; Shi, Q. Joint Beamforming and Power Allocation for RIS-Aided Full-Duplex Integrated Sensing and Uplink Communication System. IEEE Trans. Wirel. Commun. 2024, 23, 4627–4642. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, W.; Liu, C.; Sun, J.; Wang, C.X. Reconfigurable Intelligent Surface for NLOS Integrated Sensing and Communications. In Proceedings of the 2022 IEEE/CIC International Conference on Communications in China (ICCC), Sanshui, China, 11–13 August 2022; pp. 708–712. [Google Scholar]
Boukouvala, F.; Misener, R.; Floudas, C.A. Global optimization advances in mixed-integer nonlinear programming, MINLP, and constrained derivative-free optimization, CDFO. Eur. J. Oper. Res. 2016, 252, 701–727. [Google Scholar] [CrossRef]
Yu, Z.; Zhang, X.; Li, Y.; Zhao, W. Active RIS-Aided ISAC Systems: Beamforming Design and Performance Analysis. IEEE Trans. Commun. 2024, 72, 1578–1595. [Google Scholar] [CrossRef]
Sankar, R.S.P.; Chepuri, S.P.; Eldar, Y.C. Beamforming in Integrated Sensing and Communication Systems with Reconfigurable Intelligent Surfaces. IEEE Trans. Wirel. Commun. 2024, 23, 4017–4031. [Google Scholar] [CrossRef]
Long, X.; Zhao, Y.; Wu, H.; Xu, C.-Z. Deep Reinforcement Learning for Integrated Sensing and Communication in RIS-Assisted 6G V2X System. IEEE Internet Things J. 2024, 11, 39834–39849. [Google Scholar] [CrossRef]
Xia, F.; Zhang, C.; Wang, R.; Li, J. Sensing-Enabled Predictive Beamforming Design for RIS-Assisted V2I Systems: A Deep Learning Approach. IEEE Trans. Wirel. Commun. 2024, 23, 5571–5586. [Google Scholar] [CrossRef]
Chen, K.; Qi, C.; Dobre, O.A.; Li, G.Y. Simultaneous Beam Training and Target Sensing in ISAC Systems with RIS. IEEE Trans. Wirel. Commun. 2024, 23, 2696–2710. [Google Scholar] [CrossRef]
Ma, Y.; Chen, X.; Zhang, L. Base Station Handover Based on User Trajectory Prediction in 5G Networks. In Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York, NY, USA, 30 September–3 October 2021; pp. 1476–1482. [Google Scholar]
Tang, R.; Qi, C.; Sun, Y. Blockage Prediction and Fast Handover of Base Station for Millimeter Wave Communications. IEEE Commun. Lett. 2023, 27, 2142–2146. [Google Scholar] [CrossRef]
Zhang, Z.; Jiang, Z.; Yang, B.; She, X. A Beamforming-Based Enhanced Handover Scheme with Adaptive Threshold for 5G Heterogeneous Networks. Electronics 2023, 12, 4131. [Google Scholar] [CrossRef]
Alizadeh, A.; Lim, B.; Vu, M. Multi-Agent Q-Learning for Real-Time Load Balancing User Association and Handover in Mobile Networks. IEEE Trans. Wirel. Commun. 2024, 23, 9001–9015. [Google Scholar] [CrossRef]
Kose, A.; Foh, C.H.; Lee, H.; Dianati, M. Beam-centric handover decision in dense 5G-mmWave networks. In Proceedings of the 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications, London, UK, 31 August–3 September 2020; pp. 1–6. [Google Scholar]
Zhao, D.; Lu, H.; Wang, Y.; Sun, H.; Gui, Y. Joint power allocation and user association optimization for IRS-assisted mmWave systems. IEEE Trans. Wirel. Commun. 2021, 21, 577–590. [Google Scholar] [CrossRef]
Han, D.; Liu, T.; Wu, F.; Zhou, Z.; Wei, Y. Performance Optimization of Multi-Base Station Heterogeneous Network Based on New Energy Power Supply. IEEE Syst. J. 2022, 17, 2331–2342. [Google Scholar] [CrossRef]
Tong, X.; Zhang, Z.; Zhang, Y.; Yang, Z.; Huang, C.; Wong, K.-K.; Debbah, M. Environment sensing considering the occlusion effect: A multi-view approach. IEEE Trans. Signal Process. 2022, 70, 3598–3615. [Google Scholar] [CrossRef]
Wei, Z.; Jiang, W.; Feng, Z.; Wu, H.; Zhang, N.; Han, K.; Xu, R.; Zhang, P. Integrated sensing and communication enabled multiple base stations cooperative sensing towards 6G. IEEE Netw. 2023, 38, 207–215. [Google Scholar] [CrossRef]
Zhang, J.; Wang, R.; Wu, J. Efficient Resource Allocation for Multi-BS Multi-UE Integrated Sensing and Communication System. In Proceedings of the 2023 IEEE Conference on Antenna Measurements and Applications (CAMA), Genoa, Italy, 15–17 November 2023; pp. 364–368. [Google Scholar]
Salem, A.A.; Ismail, M.H.; Ibrahim, A.S. Active reconfigurable intelligent surface-assisted MISO integrated sensing and communication systems for secure operation. IEEE Trans. Veh. Technol. 2022, 72, 4919–4931. [Google Scholar] [CrossRef]
Song, X.; Zhao, D.; Hua, H.; Han, T.X.; Yang, X.; Xu, J. Joint transmit and reflective beamforming for IRS-assisted integrated sensing and communication. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 189–194. [Google Scholar]
Nayeri, P.; Yang, F.; Elsherbeni, A.Z. Beam-scanning reflectarray antennas: A technical overview and state of the art. IEEE Antennas Propag. Mag. 2015, 57, 32–47. [Google Scholar] [CrossRef]
Wang, R.; Xia, F.; Huang, J.; Wang, X.; Fei, Z. Cap-net: A deep learning-based angle prediction approach for ISAC-enabled RIS-assisted V2I communications. In Proceedings of the 22nd IEEE International Conference on Communication Technology (ICCT), Nanjing, China, 11–14 November 2022; pp. 1255–1259. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the 29th international Conference on Neural Information Processing Systems-Volume 1 (NIPS’15), Montreal, QC, Canada, 7–12 December 2015; pp. 802–810. [Google Scholar]
Cacciapuoti, A.S. Mobility-aware user association for 5G mmWave networks. IEEE Access 2017, 5, 21497–21507. [Google Scholar] [CrossRef]

Figure 1. RIS-assisted MBS IoV scenario.

Figure 2. Architecture of MFD-DDPG.

Figure 3. Architecture of MDCA-DDPG.

Figure 4. Communication rate under different numbers of BSs for SU.

Figure 5. CDF under different numbers of BSs for SU.

Figure 6. Communication rate under different numbers of BSs for MU.

Figure 7. CDF under different numbers of BSs for MU.

Figure 8. Rewards of the two algorithms for different numbers of RIS elements.

Figure 9. Communication rate of the two algorithms for different numbers of RIS elements.

Figure 10. Communication rate of SU-MB with different algorithms.

Figure 11. Communication rate of MU-MB with different algorithms.

Table 1. Parameter Settings.

Parameter Name	Value
Vehicle Speed	20 m/s
Time Slot Length	20 ms
Path loss at a distance of 1 m	−30 db
Base Station Transmission Frequency	30 GHz
Path Loss Factor	2.3, 2.3
Blockage Coefficient	0, 1
DDPG Network Samples per Update	256
DDPG Network Soft Update Parameter	0.005
DDPG Network Discount Rate	0.98
Reward Parameter Weights	0.9, −5, −1, −2
Experience Pool Size	600,000
Neurons per Actor Network Layer	64, 256
Actor Output Layer Function	tanh
Actor/Critic Network Learning Rate	1 × 10⁻⁵, 1 × 10⁻⁴
LSTM network learning rate	1 × 10⁻³
LSTM network time step length	10
LSTM network hidden layers	128
LSTM network samples per update	64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lai, J.; Wang, D.; Zhao, Y. Joint Beam Switching and Beam Design for RIS-Assisted Multi-Base Station IoV. Appl. Sci. 2026, 16, 5399. https://doi.org/10.3390/app16115399

AMA Style

Lai J, Wang D, Zhao Y. Joint Beam Switching and Beam Design for RIS-Assisted Multi-Base Station IoV. Applied Sciences. 2026; 16(11):5399. https://doi.org/10.3390/app16115399

Chicago/Turabian Style

Lai, Jinxiang, Deqing Wang, and Yifeng Zhao. 2026. "Joint Beam Switching and Beam Design for RIS-Assisted Multi-Base Station IoV" Applied Sciences 16, no. 11: 5399. https://doi.org/10.3390/app16115399

APA Style

Lai, J., Wang, D., & Zhao, Y. (2026). Joint Beam Switching and Beam Design for RIS-Assisted Multi-Base Station IoV. Applied Sciences, 16(11), 5399. https://doi.org/10.3390/app16115399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Joint Beam Switching and Beam Design for RIS-Assisted Multi-Base Station IoV

Abstract

1. Introduction

2. Related Work

3. System Model

3.1. The Association Model Between BSs and Vehicles

3.2. Communication Model

3.3. Sensing Model

3.4. Inter-BS Switching Model

3.5. Problem Formulation

4. Proposed Algorithm

4.1. LSTM-Based Prediction Algorithm

4.2. Multi-Factor Decision DDPG Algorithm (MFD-DDPG)

4.3. Mixed Discrete and Continuous Action DDPG (MDCA-DDPG)

5. Numerical Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI