Collaborative Online Learning-Based Distributed Handover Scheme in Hybrid VLC/RF 5G Systems

Maimaiti, Saidiwaerdi; Huang, Shuman; Zhang, Kaisa; Liu, Xuewen; Xu, Zhiwei; Mi, Jihang

doi:10.3390/electronics14061142

Open AccessArticle

Collaborative Online Learning-Based Distributed Handover Scheme in Hybrid VLC/RF 5G Systems

by

Saidiwaerdi Maimaiti

^1,2,†

,

Shuman Huang

^3,†,

Kaisa Zhang

^3,*

,

Xuewen Liu

⁴

,

Zhiwei Xu

⁵ and

Jihang Mi

⁶

¹

School of Information Engineering, Xinjiang Institute of Engineering, Urumqi 830023, China

²

College of Physics and Electrical Engineering, Kashi University, Kashi 844008, China

³

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

⁴

Department of Electronics and Communication Engineering, Beijing Electronics Science and Technology Institute, Beijing 100070, China

⁵

Haihe Laboratory of Information Technology Application Innovation (ITAI), Tianjin 300459, China

⁶

Digital Technology Company Limited of Aerospace Science and Industry Corp., Beijing 100048, China

^*

Author to whom correspondence should be addressed.

^†

These authors are co-first authors of this article.

Electronics 2025, 14(6), 1142; https://doi.org/10.3390/electronics14061142

Submission received: 13 February 2025 / Revised: 4 March 2025 / Accepted: 12 March 2025 / Published: 14 March 2025

(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

This paper investigates handover in hybrid visible light communication (VLC)/radio frequency (RF) networks. In such a network, mobile users are prone to experience frequent handovers (FHOs). To this end, we propose a collaborative online learning-based handover scheme (COLH) in hybrid VLC/RF 5G systems. By selecting the next access point (AP) to which a user should handover, our goal is to make the user–AP connection as long as possible after the handover, defined as a reward that is learned online through a multi-armed bandit (MAB) framework. Unlike previous schemes based on independent and collective learning, first, our scheme dynamically clusters users with similar feedback on a given AP. Second, the users in the same cluster collaborate in estimating the expected reward for that AP, and the one with the maximum expected reward is selected as the next AP. This scheme can be implemented without extensive offline training and location information; thus, its practicality is greatly enhanced. The simulation results show that the proposal outperforms existing benchmarks on reducing handovers.

Keywords:

collaborative learning; distributed handover; hybrid visible light communication (VLC)/radio frequency (RF) 5G network; multi-armed bandit (MAB)

1. Introduction

With the explosive growth of mobile traffic, the tolerance for transmission delay, operational costs, user experience, and spectrum compression of radio frequency (RF) will become an urgent crisis in the future. Visible light communication (VLC) has many advantages over RF, including access to an abundant, free, and unregulated light spectrum and the ability to provide hyper-speed data transmission. Moreover, there is no electro-magnetic interference between VLC and RF. Therefore, VLC in the existing RF network can overcome RF limitations and improve system performance [1,2]. In addition, compared with independent VLC networks, VLC access to existing RF networks also improves system throughput [3].

The resulting hybrid VLC/RF 5G network, which combines the high-speed data transmission of VLC and the ubiquitous coverage of RF, has been proven to provide better network performance than a standalone VLC or RF network [3]. However, mobile users in this network readily experience frequent handovers (FHOs) due to three factors: (i) the small coverage of a VLC access point (AP); (ii) the vulnerability of VLC links to blockage; and (iii) the tricky handover decisions in the case of multiple candidate APs with overlapping coverage. The large number of handovers will cause heavy signaling overhead, higher outage probability, and increased power consumption. As such, designing an efficient handover scheme to address the FHO issue is highly important in hybrid VLC/RF 5G networks.

Conventional handover schemes rely on the current channel state while ignoring its future variations and, thus, are regarded as “short-sighted” [4]. Such schemes are not efficient in VLC-enabled networks due to the abrupt VLC channel variations. Research on reducing handovers in hybrid VLC/RF 5G networks has only just begun. Analyses based on historical data are often used to optimize handover [5,6]. The Analytic Hierarchy Process and Cooperative Game Method are used to determine whether to perform vertical switching [7]. Researchers have also proposed an improved genetic algorithm method [8], which uses time series classification to minimize the number of handovers in VLC systems. To avoid unnecessary handovers, the authors of [9] leveraged the rate of variation in received signal strength to indicate whether a user is moving towards a certain AP, and the authors of [10] proposed a power-based handover scheme to skip unnecessary handovers. On this basis, the authors of [11] further optimized the dynamic adjustment of the selection preference between VLC and RF via machine learning. Nevertheless, these methods only capture very near-future variations in the channel state within the time-to-trigger period (i.e., hundreds of milliseconds). Once the user alters its moving direction or encounters an unexpected obstacle soon after performing the decision, a new unnecessary handover will be triggered. An intuitive solution is to make a “long-sighted” decision by predicting the post-decision trajectory [12] or link blockage [13]. Interestingly, the author of [14] stated that the trajectories of different users have similarities. This trajectory similarity was then exploited to improve the prediction accuracy in [15]. More recently, the authors of [16] described a classification model used to predict the type of user’s trajectory and assist a reinforcement learning model to make handover decisions that can dynamically adapt to new network conditions. Despite this, the prediction-based solutions require extensive offline training and users’ location information, which are not always available in practice. In [17], the researchers developed a neural network-based handover scheme that classifies the handover moments between light fidelity (LiFi)and wireless fidelity (WiFi) based on channel quality, user movement, and device direction.

Multi-armed bandit (MAB) is a powerful online learning technique in which an agent learns from interactions with the network to maximize its expected reward, enabling it to reach the “long-sighted” decision in dynamic networks. Accordingly, this paper proposes a collaborative online learning-based handover scheme (COLH) in hybrid VLC/RF 5G networks, aiming to maximize the expected user–AP connection time after each handover. Aided by the clustering of bandits (CBA) approach [18], our scheme can be divided into two modules, i.e., MAB-based AP selection (BAPS) and feedback sharing (FBS). First, FBS dynamically clusters similar users according to the shared feedback, which indirectly exploits the trajectory similarity without requiring location information. Then, in BAPS, the users in the same cluster can select APs in a collaborative way. That is, each user selects the AP by learning from the historical handover experiences of both themselves and similar users.

Briefly, the main contributions are as follows: (1) unlike independent and collective learning [19], we are the first to propose the collaborative online learning framework for handovers in hybrid VLC/RF 5G networks; (2) instead of using users’ location information, our scheme can exploit the trajectory similarity across users using only the shared feedback, making it practical and privacy-protective; (3) and the results verify the superiority of our scheme over the benchmarks in reducing handovers.

2. System Model

We consider a multi-user hybrid network with one WiFi AP and multiple VLC APs, as depicted in Figure 1. Let

N = {1, \dots, n, \dots, N}

and

M = {1, \dots, m, \dots, M}

denote the set of users and APs, respectively. The WiFi AP (denoted as AP 1), as a typical indoor RF technique, is deployed at the center. The VLC APs are deployed on the ceiling in a lattice topology. Specially, the number 1 indicates WiFi AP, and the number 2–10 indicates VLC AP. All the WiFi and VLC APs are linked via fiber to a central controller (CC), which takes charge of handover in this network. There is no interference between the WiFi and any VLC AP, while co-channel interference (CCI) exists where the coverage of the VLC APs overlap. We apply a frequency reuse factor of 4 to mitigate the CCI [9]. In addition, we use the time division multiple accessing to allow the APs to serve multiple users [11]. The widely used random waypoint mobility (RWP) [20] and geometric model [21] are employed to generate the user’s mobility pattern and link blockage, respectively.

If the LOS path is unblocked, the VLC channel gain is given by [2]

G_{m, n} = \{\begin{cases} \frac{(k + 1) l^{2} A}{2 π d_{m, n}^{2} \sin^{2} (Ψ)} g_{f} \cos^{k} (ϕ) \cos (φ) 0 \leq φ \leq Ψ \\ 0 φ > Ψ \end{cases}

(1)

Otherwise,

G_{m, n} = 0

, where

m \in {2, \dots, M}

.

d_{m, n}

is the distance between AP

m

and user

n

;

ϕ

and

φ

are the angles of irradiance and incidence, respectively. For transmitter

m

,

k = \frac{- 1}{\log_{2} (\cos (Φ_{1 / 2}))}

is the Lambertian order, and

Φ_{1 / 2}

is the half-intensity radiation angle. For receiver

n

,

l

is the refractive index,

g_{f}

is the optical filter gain, and

A

and

Ψ

are the physical area and field-of-view of photodiode, respectively. The signal-to-interference-plus-noise ratio (SINR) of the link between AP

m

and user

n

is

γ_{m, n} = \frac{{(η P_{V} G_{m, n})}^{2}}{B_{V} N_{V} + \sum_{i \in Ι / m} {(η P_{V} G_{i, n})}^{2}}

(2)

where

η

is the detector responsivity;

P_{V}

and

B_{V}

denote the optical power and bandwidth of the VLC AP, respectively;

N_{V}

is the noise power spectral density in VLC; and

I

denotes the set of VLC APs that use the same frequency spectrum as AP

m

. Here, the achievable data rate can be approximated as [22]

R_{m, n} = τ_{m, n} \frac{B_{V}}{2} \log_{2} (1 + \frac{e}{2 π} γ_{m, n})

(3)

where

τ_{m, n}

is the fraction of resources assigned to user

n

by AP

m

, which is determined by a proportional fairness scheduler.

According to [23], the WiFi channel gain is written as

G_{1, n} = {|H_{1, n}|}^{2} 10^{(\frac{- L (d_{1, n}) + Z_{σ}}{10})}

(4)

where

H_{1, n}

is the channel transfer function that conforms to a Rayleigh distribution;

Z_{σ}

is the shadow fading;

L (.)

is the free-space path loss, which is expressed as

L (d) = \{\begin{cases} 20 \log_{10} (f_{c} d) - 147.5 d < d_{r e f} \\ 20 \log_{10} (f_{c} \frac{d^{2.75}}{d_{r e f}^{1.75}}) - 147.5 d \geq d_{r e f} \end{cases}

(5)

f_{c}

is the carrier frequency, and

d_{r e f}

is the reference distance. The SINR of the link between WiFi AP and user

n

is

γ_{1, n} = \frac{P_{W} G_{1, n}}{B_{W} N_{W}}

(6)

where

P_{W}

and

B_{W}

are the power and bandwidth of the WiFi AP, respectively;

N_{W}

is the noise power spectral density in WiFi. The data rate obtained is computed as

R_{1, n} = τ_{1, n} B_{W} \log_{2} (1 + γ_{1, n})

(7)

3. Collaborative Online Learning-Based Handover Scheme

In this Section, the framework of the proposed COLH is illustrated in Figure 2. On the user side, the handover is triggered by Event A2 (i.e., the achieved rate falls below the minimum required rate) and is initiated by sending measurement reports to the CC. On the CC side, the decision to select the next AP is made by learning from the historical handover experiences of the users themselves and similar users. The CC includes two modules, BAPS and FBS, which are elaborated on below.

3.1. MAB-Based AP Selection Module

In the dynamic network, the handover process involves a sequence of AP selection decisions. This means that addressing the FHO issue boils down to finding the sequential APs that maximize the long-term reward. Here, the reward is defined as the user–AP connection time after the handover. In fact, the MAB is exactly the problem in which an agent decides which arm to pull in a number of rounds. In this regard, the AP selection problem for each user can be modeled as the MAB problem.

The BAPS module consists of

N

bandits, each of which serves a user by selecting the next AP when Event A2 occurs. Each bandit has

M

arms, each representing an AP. The BAPS module proceeds in discrete times. Upon the handover of user

n

is triggered by A2 at time

t

, and its context vectors with respect to all candidate APs are observed, denoted as

C_{t} = {X_{t, m}, m \in M_{t}} \subseteq ℝ^{M}

, where

M_{t} = {m \in M : R_{m, n} \geq R_{\min}}

is the candidate AP set and

R_{\min}

is the minimum required rate. In particular, the context vector

X_{t, m}

is generated by applying one-hot encoding with one at the

m

-th position and zero everywhere else. The bandit serving user n then selects an AP

m_{t}

from the candidate APs based on their expected rewards and will observe the reward (feedback) associated with

m_{t}

at the next handover trigger time

t^{'}

, denoted as

r_{t, m_{t}} = t^{'} - t

. The observation

(C_{t}, m_{t}, r_{t, m_{t}})

is learned to improve the AP selection policy. The practical goal of the BAPS module is to find the optimal policy that maximizes the total reward over time. Equivalently, we interpret the goal as minimizing total regret

\sum q_{t}

, where the regret at time

t

is defined as the difference between the reward of the optimal AP in hindsight and the reward of the AP actually selected, i.e.,

q_{t} = \max_{m \in M_{t}} {r_{t, m_{t}}} - r_{t, m_{t}}

.

One promising solution to the MAB problem is using the CBA approach [18]. In CBA, each bandit of user

n

at time

t

maintains a coefficient vector

w_{n, t} \in ℝ^{M}

, an additively updated vector

b_{n, t} \in ℝ^{M}

, and a correlation matrix

K_{n, t} \in ℝ^{M \times M}

. The expected reward for any AP

m

is estimated as a linear regression on the context vector, given by

w_{n, t}^{T} X_{t, m}

. In addition, the standard confidence bound function is derived as

C B_{n, t} (X_{t, m}) = α \sqrt{X_{t, m}^{T} K_{n, t}^{- 1} X_{t, m}}

, where

α

is the exploration parameter. Here, below, we present how the BAPS module selects Aps and updates the bandits:

(1) AP selection. The bandit of user

n

estimates the expected reward for each candidate AP

m

using the corresponding cluster-level parameters

w_{{\overset{⌢}{N}}_{m}, t}

and

C B_{{\overset{⌢}{N}}_{m}, t} (X_{t, m})

, which are distributed by FBS (see Section 3.2 for details).

Based upon this, the bandit of

n

selects the AP with the highest upper confidence bound, given by

m_{t} = \underset{m \in M_{t}}{\arg \max} [w_{{\overset{⌢}{N}}_{m}, t}^{T} X_{t, m} + C B_{{\overset{⌢}{N}}_{m}, t} (X_{t, m})]

(8)

In this way, the users in the same cluster can select Aps in a collaborative way. In contrast, if any bandit of user

n

uses only its own parameters

w_{n, t}

and

C B_{n, t} (X_{t, m})

, this means that the user selects the AP independently. This scheme is termed Non-COLH and serves as our benchmark.

(2) Bandit update. After handing over user

n

to the selected AP

m_{t}

, the observed reward is normalized to

{\bar{r}}_{t, m_{t}} = r_{t, m_{t}} / \max_{m \in M_{t}} {r_{t, m_{t}}}

, which is further utilized to update the bandit of user

n

with the following equations:

K_{n, t} = K_{n, t} + X_{t, m_{t}} X_{t, m_{t}}^{T}

(9)

b_{n, t} = b_{n, t} + {\bar{r}}_{t, m_{t}} X_{t, m_{t}}

(10)

w_{n, t} = K_{n, t}^{- 1} X_{t, m_{t}}

(11)

The updated parameters are then shared by FBS.

3.2. Feedback Sharing Module

We define the users who have similar trajectories to user

n

in the vicinity of AP

m

as its neighborhoods with respect to AP

m

, denoted as

N_{m}

. Intuitively, user

n

’s estimation of expected reward for AP

m

can benefit from learning its neighborhoods’ historical experiences of handing over to AP m. Unfortunately, the user neighborhoods clustered according to trajectory similarity cannot be obtained directly without users’ location information. In this subsection, we design the FBS module, which can estimate users’ neighborhoods based on feedback similarity only.

The idea comes from the fact that the users within

N_{m}

typically have similar feedback when handing over to AP

m

since the feedback

r_{t, m}

is determined by the user’s post-handover trajectory. Consequently, the estimated neighborhood of user

n

with respect to AP

m

, denoted as

{\overset{⌢}{N}}_{m}

, can be given by [18]

{\overset{⌢}{N}}_{m} = {j \in N : |w_{n, t}^{T} X_{t, m} - w_{j, t}^{T} X_{t, m}| \leq C B_{n, t} (X_{t, m}) + C B_{j, t} (X_{t, m})}

(12)

This indicates that user

j

will be involved in the neighborhood of user

n

with respect to AP

m

if their feedback related to the context vector

X_{t, m}

is sufficiently close. Afterward, the cluster-level parameters

w_{{\overset{⌢}{N}}_{m}, t}

and

C B_{{\overset{⌢}{N}}_{m}, t} (X_{t, m})

are, respectively, calculated by

w_{{\overset{⌢}{N}}_{m}, t} = \frac{1}{|{\overset{⌢}{N}}_{m}|} \sum_{j \in {\overset{⌢}{N}}_{m}} w_{j, t}

(13)

and

C B_{{\overset{⌢}{N}}_{m}, t} (X_{t, m}) = \frac{1}{|{\overset{⌢}{N}}_{m}|} \sum_{j \in {\overset{⌢}{N}}_{m}} C B_{j, t} (X_{t, m})

(14)

Note that

|{\overset{⌢}{N}}_{m}|

denotes the cardinality of set

{\overset{⌢}{N}}_{m}

.

The overall procedure of the proposed COLH is depicted in Algorithm 1.

Algorithm 1: The Proposed COLH

1: Input: exploration parameter

a

.
2: Init:

b_{n, 0} = 0 \in ℝ^{M}

, and

K_{n, 0} = I \in ℝ^{M \times M}, \forall \in_{n} N

.
3: while The handover of any user

n

is triggered by A2 do
4: Record current time

t

.
5: Idenitify context vector set

C_{t} = {X_{t, m}, m \in M_{t}}

.
FBS module:
6: for all

m \in M_{t}

do
7: Compute neighborhood

{\overset{⌢}{N}}_{m}

for AP

m

according to (12).
8: Set

w_{{\overset{⌢}{N}}_{m}, t}

and

C B_{{\overset{⌢}{N}}_{m}, t} (X_{t, m})

according to (13) and (14), resp.
9: end for
10: Distribute the cluster-level

w_{{\overset{⌢}{N}}_{m}, t}

and C B_{{\overset{⌢}{N}}_{m}, t} (X_{t, m})

to the BAPS.
BAPS module:
11: Select AP

m_{t}

according to (8) and handover user

n

to this AP.
12: Observe reward

{\bar{r}}_{t, m_{t}}

at the user n’s next handover trigger time

t^{'}

.
13: Update

K_{n, t}

,

b_{n, t}

, and

w_{n, t}

according to (9), (10), (11), resp.
14: Share

K_{n, t}

and

w_{n, t}

with the FBS.

4. Performance Evaluation

For simulations, we consider an

16 \times 16 \times 3 m^{3}

office-type indoor room with 1 WiFi AP and 16 VLC APs. The separation between the nearest VLC APs is set to

4 m

, and the vertical distance between each user and the VLC AP is set to

2.5 m

. Similar to [15], we generate a series of trajectories for each user over

N_{d}

consecutive days and introduce a random mobility parameter

γ \in [0, 1]

to control the inter-trajectory dependence. For example,

γ = 0.4

means that the user follows random trajectories in 40% of the total days (i.e.,

0.4 * N_{d}

) and follows a regular trajectory in the other days. Each day, the user moves at a speed of

1 m / s

according to the RWP model [20] until reaching

1000

iterations (time units), and a trajectory is thus generated. Note that the trajectories for different users are generated independently. Following [21], we use a cylindrical object with a height of

1.75 m

and a radius of

0.15 m

to model the obstacle, which is assumed to be randomly distributed. If the line segment between the user and the obstacle intersects, the LOS path is considered to be blocked. The parameters related to WiFi and VLC channels refer to [2]. The hyper-parameter

α

is set to

0.16

empirically. Unless otherwise stated, the number of users and the number of obstacles (Note: An obstacle refers to any physical or environmental object that can interfere with or attenuate a signal’s propagation path, strength, or quality),

γ

and

R_{\min}

, are set to

3, 20, 0.4

and

120 M b p s

, respectively. To reflect the average QoE of an arbitrary user, we introduce two metrics as in [5], namely the average number of handovers per user (ANH) and the average lasting time per each connection (ALT). The results are averaged over

500

times to mitigate the randomness of the trajectory and obstacle distribution.

First, Figure 3 shows the convergence of the proposed COLH in terms of the ANH under different values (i.e.,

0, 0.4,

and

1

). Here,

N_{d} = 20

. We can see that COLH gradually converges under all values as the number of iterations increases. Nevertheless, under a larger value of

γ

, COLH converges to a higher ANH at a slower convergence rate. This is because the larger mobility randomness during the learning process of COLH requires more samples.

Next, we compare COLH with three benchmarks over the ANH and ALT: (1) rate-first handover (RFH); (2) non-COLH (NCOLH); and (3) SMART [19]. Specifically, RFH selects the AP that provides the highest rate in the present, while both NCOLH and SMART select the AP that provides the longest connection time in the future. In NCOLH, each user has its own bandit and performs independent learning. In SMART, all users share a common bandit and perform collective learning.

Figure 4 depicts the comparison of the ANH and ALT obtained under different schemes by varying the number of users from one to nine after 15,000 iterations. We see that the performance of all schemes degrades as the number of users increases due to the ensuing resource competition. More importantly, the “shortsighted” RFH exhibits the worst performance, while COLH exhibits the best one among its “long-sighted” counterparts. Moreover, the gap between COLH and its benchmarks grows. Although all users are of the same service type (because of the same rate requirements [19]), their trajectories are different but somewhat similar. As a result, the increase in the number of users exacerbates the disadvantage of collective learning in SMART and enlarges the advantage of collaborative learning in COLH. Concretely, when nine users are involved, the ALT obtained under COLH is larger than that under NCOLH, SMART, and RFH by 4.3%, 13.4%, and 38.5%, respectively. The performance gains obtained reflect not only the superiority of making “long-sighted” decisions through online learning but also the effectiveness of the collaborative effects incorporated by using feedback similarity.

Figure 5 depicts the comparison of the ANH and ALT obtained under different schemes by varying the numbers of obstacles from 10 to 30 after 15,000 iterations. With the number of obstacles increases, the ANH increases and the ALT decreases no matter the scheme. This trend is consistent with the intuition that more handovers occur in complex environments with many obstacles than in simple environments with few obstacles. Furthermore, COLH always outperforms the benchmarks, especially in complex environments.

5. Conclusions

In this paper, a novel handover scheme in hybrid VLC/RF networks was proposed, through which the users in the same cluster collaborate to select APs. By sharing feedback, this scheme clusters similar users without involving any user trajectory, which improves its practicality and protects user privacy. On this basis, the feedback similarities across users are utilized for a smarter estimation of expected rewards and further reduction in handovers in the long run. By incorporating the collaborative effects of the user clusters, the proposal has been verified to outperform its benchmarks on reducing handovers.

Author Contributions

Conceptualization, data curation, investigation, methodology, resources, and simulation, S.M. and S.H.; formal analysis and supervision, K.Z.; visualization and writing—original draft, S.M. and X.L.; validation and writing—review and editing, S.M., S.H., Z.X. and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Basic research funds for universities in the Xinjiang Uygur Autonomous Region in 2024 under Grant No. XJEDU2024P064. This work was also supported by the natural science foundation of Xinjiang Uygur Autonomous Region in 2024 under Grant No. 2024D01A12. This work was also supported by the doctoral research projects of Xinjiang Institute of Engineering under Grant No. 2025XGYBQJ31.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Jihang Mi was employed by the company Digital Technology Company Limited of Aerospace Science and Industry Corp. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AP	Access point
ANH	Average number of handover
ALT	Average lasting time
BAPS	Based AP selection
COLH	Collaborative online learning-based handover
CBA	Clustering of bandits approach
CC	Central controller
CCI	Co-channel interference
FHO	Frequent handovers
FBS	Feedback sharing
LiFi	Light fidelity
MAB	Multi-armed bandit
NCOLH	Non-collaborative online learning-based handover
VLC/RF	Visible light communication/radio frequency
VLC	Visible light communication
RF	Radio frequency
RWP	Random waypoint mobility
RFH	Rate-first handover
WiFi	Wireless fidelity

References

Arshad, R.; Lampe, L. Stochastic Geometry Analysis of User Mobility in RF/VLC Hybrid Networks. IEEE Trans. Wirel. Commun. 2021, 20, 7404–7419. [Google Scholar] [CrossRef]
Huang, S.; Chuai, G.; Gao, W. Two-Way Selection Handover Algorithm for Load Balancing in Hybrid VLC-RF Networks. In Proceedings of the 2021 IEEE/CIC International Conference on Communications in China (ICCC), Xiamen, China, 28–30 July 2021; pp. 1065–1070. [Google Scholar]
Basnayaka, D.A.; Haas, H. Hybrid RF and VLC Systems: Improving User Data Rate Performance of VLC Systems. In Proceedings of the 2015 IEEE 81st Vehicular Technology Conference (VTC Spring), Glasgow, UK, 11–14 May 2015; pp. 1–5. [Google Scholar]
Sun, L.; Hou, J.; Shu, T. Spatial and Temporal Contextual Multi-Armed Bandit Handovers in Ultra-Dense mmWave Cellular Networks. IEEE Trans. Mob. Comput. 2021, 20, 3423–3438. [Google Scholar] [CrossRef]
Bao, X.; Okine, A.; Shi, L.; Bao, N.; Adjardjah, W. Channel Adaptive Dwell Timer for Vertical Handoff in Hybrid VLC and Wi-Fi Networks. In Proceedings of the 2018 IEEE/CIC International Conference on Communications in China (ICCC), Beijing, China, 16–18 August 2018; pp. 609–613. [Google Scholar]
Liu, R.; Zhang, C. Dynamic dwell timer for vertical handover in VLC-WLAN heterogeneous networks. In Proceedings of the 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC), Valencia, Spain, 26–30 June 2017; pp. 1256–1260. [Google Scholar]
Liang, S.; Zhang, Y.; Fan, B.; Tian, H. Multi-Attribute Vertical Handover Decision-Making Algorithm in a Hybrid VLC-Femto System. IEEE Commun. Lett. 2017, 21, 1521–1524. [Google Scholar] [CrossRef]
Camporez, H.; Costa, W.; Segatto, M.; Silva, J.; Deters, J.K.; Wörtche, H. AI-Driven Enhancements for Handover in Visible Light Communication Systems. J. Light. Technol. 2024, 42, 8191–8202. [Google Scholar] [CrossRef]
Wu, X.; O’Brien, D.C.; Deng, X.; Linnartz, J.-P.M.G. Smart Handover for Hybrid LiFi and WiFi Networks. IEEE Trans. Wirel. Commun. 2020, 19, 8211–8219. [Google Scholar] [CrossRef]
Wu, X.; Haas, H. Handover skipping for LiFi. IEEE Access 2019, 7, 38369–38378. [Google Scholar] [CrossRef]
Wu, X.; O’Brien, D.C. A Novel Machine Learning-Based Handover Scheme for Hybrid LiFi and WiFi Networks. In Proceedings of the 2020 IEEE Globecom Workshops (GC Wkshps), Taipei, Taiwan, 7–11 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
Arfaoui, M.A.; Ghrayeb, A.; Assi, C. Cascaded Artificial Neural Networks for Proactive Power Allocation in Indoor LiFi Systems. In Proceedings of the ICC 2021-IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
Wu, Z.-Y.; Ismail, M.; Serpedin, E.; Wang, J. Efficient Prediction of Link Outage in Mobile Optical Wireless Communications. IEEE Trans. Wirel. Commun. 2021, 20, 882–896. [Google Scholar] [CrossRef]
Senaratne, H.; Mueller, M.; Behrisch, M.; Lalanne, F.; Bustos-Jiménez, J.; Schneidewind, J.; Keim, D.; Schreck, T. Urban Mobility Analysis With Mobile Network Data: A Visual Analytics Approach. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1537–1546. [Google Scholar] [CrossRef]
Peng, Y.; Zhou, Y.; Liu, L.; Li, J.; Pan, Z.; Sun, G. Intelligent Recommendation-Based User Plane Handover With Enhanced TCP Throughput in Ultra-Dense Cellular Networks. IEEE Trans. Veh. Technol. 2022, 71, 595–610. [Google Scholar] [CrossRef]
Fonseca, D.F.; Guzman, B.G.; Martena, G.L.; Bian, R.; Haas, H.; Giustiniano, D. Prediction-model-assisted reinforcement learning algorithm for handover decision-making in hybrid LiFi and WiFi networks. J. Opt. Commun. Netw. 2024, 16, 159–170. [Google Scholar] [CrossRef]
Ma, G.; Parthiban, R.; Karmakar, N. An artificial neural network-based handover scheme for hybrid LiFi networks. IEEE Access 2022, 10, 130350–130358. [Google Scholar] [CrossRef]
Gentile, C.; Li, S.; Kar, P.; Karatzoglou, A.; Zappella, G.; Etrue, E. On context-dependent clustering of bandits. Proc. Int. Conf. Mach. Learn. 2017, 70, 1253C1262. [Google Scholar]
Sun, Y.; Feng, G.; Qin, S.; Liang, Y.-C.; Yum, T.-S.P. The SMART Handoff Policy for Millimeter Wave Heterogeneous Cellular Networks. IEEE Trans. Mob. Comput. 2018, 17, 1456–1468. [Google Scholar] [CrossRef]
Soltani, M.D.; Arfaoui, M.A.; Tavakkolnia, I.; Ghrayeb, A.; Safari, M.; Assi, C.M.; Hasna, M.O.; Haas, H. Bidirectional Optical Spatial Modulation for Mobile Users: Toward a Practical Design for LiFi Systems. IEEE J. Sel. Areas Commun. 2019, 37, 2069–2086. [Google Scholar] [CrossRef]
Chen, C.; Basnayaka, D.A.; Purwita, A.A.; Wu, X.; Haas, H. Wireless Infrared-Based LiFi Uplink Transmission With Link Blockage and Random Device Orientation. IEEE Trans. Commun. 2021, 69, 1175–1188. [Google Scholar] [CrossRef]
Demir, M.S.; Uysal, M. A Cross-Layer Design for Dynamic Resource Management of VLC Networks. IEEE Trans. Commun. 2021, 69, 1858–1867. [Google Scholar] [CrossRef]
Perahia, E.; Stacey, R. Next Generation Wireless LAN: 802.11n and 802.11ac; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]

Figure 1. Hybrid VLC/RF network diagram.

Figure 2. Framework of collaborative online learning-based handover scheme.

Figure 3. Convergence performance of COLH.

Figure 4. Comparison between (a) ANH and (b) ALT with different numbers of users.

Figure 5. Comparison between (a) ANH and (b) ALT with different numbers of obstacles.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Maimaiti, S.; Huang, S.; Zhang, K.; Liu, X.; Xu, Z.; Mi, J. Collaborative Online Learning-Based Distributed Handover Scheme in Hybrid VLC/RF 5G Systems. Electronics 2025, 14, 1142. https://doi.org/10.3390/electronics14061142

AMA Style

Maimaiti S, Huang S, Zhang K, Liu X, Xu Z, Mi J. Collaborative Online Learning-Based Distributed Handover Scheme in Hybrid VLC/RF 5G Systems. Electronics. 2025; 14(6):1142. https://doi.org/10.3390/electronics14061142

Chicago/Turabian Style

Maimaiti, Saidiwaerdi, Shuman Huang, Kaisa Zhang, Xuewen Liu, Zhiwei Xu, and Jihang Mi. 2025. "Collaborative Online Learning-Based Distributed Handover Scheme in Hybrid VLC/RF 5G Systems" Electronics 14, no. 6: 1142. https://doi.org/10.3390/electronics14061142

APA Style

Maimaiti, S., Huang, S., Zhang, K., Liu, X., Xu, Z., & Mi, J. (2025). Collaborative Online Learning-Based Distributed Handover Scheme in Hybrid VLC/RF 5G Systems. Electronics, 14(6), 1142. https://doi.org/10.3390/electronics14061142

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Collaborative Online Learning-Based Distributed Handover Scheme in Hybrid VLC/RF 5G Systems

Abstract

1. Introduction

2. System Model

3. Collaborative Online Learning-Based Handover Scheme

3.1. MAB-Based AP Selection Module

3.2. Feedback Sharing Module

4. Performance Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI