Abstract
This paper investigates handover in hybrid visible light communication (VLC)/radio frequency (RF) networks. In such a network, mobile users are prone to experience frequent handovers (FHOs). To this end, we propose a collaborative online learning-based handover scheme (COLH) in hybrid VLC/RF 5G systems. By selecting the next access point (AP) to which a user should handover, our goal is to make the user–AP connection as long as possible after the handover, defined as a reward that is learned online through a multi-armed bandit (MAB) framework. Unlike previous schemes based on independent and collective learning, first, our scheme dynamically clusters users with similar feedback on a given AP. Second, the users in the same cluster collaborate in estimating the expected reward for that AP, and the one with the maximum expected reward is selected as the next AP. This scheme can be implemented without extensive offline training and location information; thus, its practicality is greatly enhanced. The simulation results show that the proposal outperforms existing benchmarks on reducing handovers.
1. Introduction
With the explosive growth of mobile traffic, the tolerance for transmission delay, operational costs, user experience, and spectrum compression of radio frequency (RF) will become an urgent crisis in the future. Visible light communication (VLC) has many advantages over RF, including access to an abundant, free, and unregulated light spectrum and the ability to provide hyper-speed data transmission. Moreover, there is no electro-magnetic interference between VLC and RF. Therefore, VLC in the existing RF network can overcome RF limitations and improve system performance [1,2]. In addition, compared with independent VLC networks, VLC access to existing RF networks also improves system throughput [3].
The resulting hybrid VLC/RF 5G network, which combines the high-speed data transmission of VLC and the ubiquitous coverage of RF, has been proven to provide better network performance than a standalone VLC or RF network [3]. However, mobile users in this network readily experience frequent handovers (FHOs) due to three factors: (i) the small coverage of a VLC access point (AP); (ii) the vulnerability of VLC links to blockage; and (iii) the tricky handover decisions in the case of multiple candidate APs with overlapping coverage. The large number of handovers will cause heavy signaling overhead, higher outage probability, and increased power consumption. As such, designing an efficient handover scheme to address the FHO issue is highly important in hybrid VLC/RF 5G networks.
Conventional handover schemes rely on the current channel state while ignoring its future variations and, thus, are regarded as “short-sighted” [4]. Such schemes are not efficient in VLC-enabled networks due to the abrupt VLC channel variations. Research on reducing handovers in hybrid VLC/RF 5G networks has only just begun. Analyses based on historical data are often used to optimize handover [5,6]. The Analytic Hierarchy Process and Cooperative Game Method are used to determine whether to perform vertical switching [7]. Researchers have also proposed an improved genetic algorithm method [8], which uses time series classification to minimize the number of handovers in VLC systems. To avoid unnecessary handovers, the authors of [9] leveraged the rate of variation in received signal strength to indicate whether a user is moving towards a certain AP, and the authors of [10] proposed a power-based handover scheme to skip unnecessary handovers. On this basis, the authors of [11] further optimized the dynamic adjustment of the selection preference between VLC and RF via machine learning. Nevertheless, these methods only capture very near-future variations in the channel state within the time-to-trigger period (i.e., hundreds of milliseconds). Once the user alters its moving direction or encounters an unexpected obstacle soon after performing the decision, a new unnecessary handover will be triggered. An intuitive solution is to make a “long-sighted” decision by predicting the post-decision trajectory [12] or link blockage [13]. Interestingly, the author of [14] stated that the trajectories of different users have similarities. This trajectory similarity was then exploited to improve the prediction accuracy in [15]. More recently, the authors of [16] described a classification model used to predict the type of user’s trajectory and assist a reinforcement learning model to make handover decisions that can dynamically adapt to new network conditions. Despite this, the prediction-based solutions require extensive offline training and users’ location information, which are not always available in practice. In [17], the researchers developed a neural network-based handover scheme that classifies the handover moments between light fidelity (LiFi)and wireless fidelity (WiFi) based on channel quality, user movement, and device direction.
Multi-armed bandit (MAB) is a powerful online learning technique in which an agent learns from interactions with the network to maximize its expected reward, enabling it to reach the “long-sighted” decision in dynamic networks. Accordingly, this paper proposes a collaborative online learning-based handover scheme (COLH) in hybrid VLC/RF 5G networks, aiming to maximize the expected user–AP connection time after each handover. Aided by the clustering of bandits (CBA) approach [18], our scheme can be divided into two modules, i.e., MAB-based AP selection (BAPS) and feedback sharing (FBS). First, FBS dynamically clusters similar users according to the shared feedback, which indirectly exploits the trajectory similarity without requiring location information. Then, in BAPS, the users in the same cluster can select APs in a collaborative way. That is, each user selects the AP by learning from the historical handover experiences of both themselves and similar users.
Briefly, the main contributions are as follows: (1) unlike independent and collective learning [19], we are the first to propose the collaborative online learning framework for handovers in hybrid VLC/RF 5G networks; (2) instead of using users’ location information, our scheme can exploit the trajectory similarity across users using only the shared feedback, making it practical and privacy-protective; (3) and the results verify the superiority of our scheme over the benchmarks in reducing handovers.
2. System Model
We consider a multi-user hybrid network with one WiFi AP and multiple VLC APs, as depicted in Figure 1. Let and denote the set of users and APs, respectively. The WiFi AP (denoted as AP 1), as a typical indoor RF technique, is deployed at the center. The VLC APs are deployed on the ceiling in a lattice topology. Specially, the number 1 indicates WiFi AP, and the number 2–10 indicates VLC AP. All the WiFi and VLC APs are linked via fiber to a central controller (CC), which takes charge of handover in this network. There is no interference between the WiFi and any VLC AP, while co-channel interference (CCI) exists where the coverage of the VLC APs overlap. We apply a frequency reuse factor of 4 to mitigate the CCI [9]. In addition, we use the time division multiple accessing to allow the APs to serve multiple users [11]. The widely used random waypoint mobility (RWP) [20] and geometric model [21] are employed to generate the user’s mobility pattern and link blockage, respectively.
Figure 1.
Hybrid VLC/RF network diagram.
If the LOS path is unblocked, the VLC channel gain is given by [2]
Otherwise, , where . is the distance between AP and user ; and are the angles of irradiance and incidence, respectively. For transmitter , is the Lambertian order, and is the half-intensity radiation angle. For receiver , is the refractive index, is the optical filter gain, and and are the physical area and field-of-view of photodiode, respectively. The signal-to-interference-plus-noise ratio (SINR) of the link between AP and user is
where is the detector responsivity; and denote the optical power and bandwidth of the VLC AP, respectively; is the noise power spectral density in VLC; and denotes the set of VLC APs that use the same frequency spectrum as AP . Here, the achievable data rate can be approximated as [22]
where is the fraction of resources assigned to user by AP , which is determined by a proportional fairness scheduler.
According to [23], the WiFi channel gain is written as
where is the channel transfer function that conforms to a Rayleigh distribution; is the shadow fading; is the free-space path loss, which is expressed as
is the carrier frequency, and is the reference distance. The SINR of the link between WiFi AP and user is
where and are the power and bandwidth of the WiFi AP, respectively; is the noise power spectral density in WiFi. The data rate obtained is computed as
3. Collaborative Online Learning-Based Handover Scheme
In this Section, the framework of the proposed COLH is illustrated in Figure 2. On the user side, the handover is triggered by Event A2 (i.e., the achieved rate falls below the minimum required rate) and is initiated by sending measurement reports to the CC. On the CC side, the decision to select the next AP is made by learning from the historical handover experiences of the users themselves and similar users. The CC includes two modules, BAPS and FBS, which are elaborated on below.
Figure 2.
Framework of collaborative online learning-based handover scheme.
3.1. MAB-Based AP Selection Module
In the dynamic network, the handover process involves a sequence of AP selection decisions. This means that addressing the FHO issue boils down to finding the sequential APs that maximize the long-term reward. Here, the reward is defined as the user–AP connection time after the handover. In fact, the MAB is exactly the problem in which an agent decides which arm to pull in a number of rounds. In this regard, the AP selection problem for each user can be modeled as the MAB problem.
The BAPS module consists of bandits, each of which serves a user by selecting the next AP when Event A2 occurs. Each bandit has arms, each representing an AP. The BAPS module proceeds in discrete times. Upon the handover of user is triggered by A2 at time , and its context vectors with respect to all candidate APs are observed, denoted as , where is the candidate AP set and is the minimum required rate. In particular, the context vector is generated by applying one-hot encoding with one at the -th position and zero everywhere else. The bandit serving user n then selects an AP from the candidate APs based on their expected rewards and will observe the reward (feedback) associated with at the next handover trigger time , denoted as . The observation is learned to improve the AP selection policy. The practical goal of the BAPS module is to find the optimal policy that maximizes the total reward over time. Equivalently, we interpret the goal as minimizing total regret , where the regret at time is defined as the difference between the reward of the optimal AP in hindsight and the reward of the AP actually selected, i.e., .
One promising solution to the MAB problem is using the CBA approach [18]. In CBA, each bandit of user at time maintains a coefficient vector , an additively updated vector , and a correlation matrix . The expected reward for any AP is estimated as a linear regression on the context vector, given by . In addition, the standard confidence bound function is derived as , where is the exploration parameter. Here, below, we present how the BAPS module selects Aps and updates the bandits:
(1) AP selection. The bandit of user estimates the expected reward for each candidate AP using the corresponding cluster-level parameters and , which are distributed by FBS (see Section 3.2 for details).
Based upon this, the bandit of selects the AP with the highest upper confidence bound, given by
In this way, the users in the same cluster can select Aps in a collaborative way. In contrast, if any bandit of user uses only its own parameters and , this means that the user selects the AP independently. This scheme is termed Non-COLH and serves as our benchmark.
(2) Bandit update. After handing over user to the selected AP , the observed reward is normalized to , which is further utilized to update the bandit of user with the following equations:
The updated parameters are then shared by FBS.
3.2. Feedback Sharing Module
We define the users who have similar trajectories to user in the vicinity of AP as its neighborhoods with respect to AP , denoted as . Intuitively, user ’s estimation of expected reward for AP can benefit from learning its neighborhoods’ historical experiences of handing over to AP m. Unfortunately, the user neighborhoods clustered according to trajectory similarity cannot be obtained directly without users’ location information. In this subsection, we design the FBS module, which can estimate users’ neighborhoods based on feedback similarity only.
The idea comes from the fact that the users within typically have similar feedback when handing over to AP since the feedback is determined by the user’s post-handover trajectory. Consequently, the estimated neighborhood of user with respect to AP , denoted as , can be given by [18]
This indicates that user will be involved in the neighborhood of user with respect to AP if their feedback related to the context vector is sufficiently close. Afterward, the cluster-level parameters and are, respectively, calculated by
and
Note that denotes the cardinality of set .
The overall procedure of the proposed COLH is depicted in Algorithm 1.
| Algorithm 1: The Proposed COLH |
| 1: Input: exploration parameter . 2: Init: , and . 3: while The handover of any user is triggered by A2 do 4: Record current time . 5: Idenitify context vector set . FBS module: 6: for all do 7: Compute neighborhood for AP according to (12). 8: Set and according to (13) and (14), resp. 9: end for 10: Distribute the cluster-level to the BAPS. BAPS module: 11: Select AP according to (8) and handover user to this AP. 12: Observe reward at the user n’s next handover trigger time . 13: Update , , and according to (9), (10), (11), resp. 14: Share and with the FBS. |
4. Performance Evaluation
For simulations, we consider an office-type indoor room with 1 WiFi AP and 16 VLC APs. The separation between the nearest VLC APs is set to , and the vertical distance between each user and the VLC AP is set to . Similar to [15], we generate a series of trajectories for each user over consecutive days and introduce a random mobility parameter to control the inter-trajectory dependence. For example, means that the user follows random trajectories in 40% of the total days (i.e., ) and follows a regular trajectory in the other days. Each day, the user moves at a speed of according to the RWP model [20] until reaching iterations (time units), and a trajectory is thus generated. Note that the trajectories for different users are generated independently. Following [21], we use a cylindrical object with a height of and a radius of to model the obstacle, which is assumed to be randomly distributed. If the line segment between the user and the obstacle intersects, the LOS path is considered to be blocked. The parameters related to WiFi and VLC channels refer to [2]. The hyper-parameter is set to empirically. Unless otherwise stated, the number of users and the number of obstacles (Note: An obstacle refers to any physical or environmental object that can interfere with or attenuate a signal’s propagation path, strength, or quality), and , are set to and , respectively. To reflect the average QoE of an arbitrary user, we introduce two metrics as in [5], namely the average number of handovers per user (ANH) and the average lasting time per each connection (ALT). The results are averaged over times to mitigate the randomness of the trajectory and obstacle distribution.
First, Figure 3 shows the convergence of the proposed COLH in terms of the ANH under different values (i.e., and ). Here, . We can see that COLH gradually converges under all values as the number of iterations increases. Nevertheless, under a larger value of , COLH converges to a higher ANH at a slower convergence rate. This is because the larger mobility randomness during the learning process of COLH requires more samples.
Figure 3.
Convergence performance of COLH.
Next, we compare COLH with three benchmarks over the ANH and ALT: (1) rate-first handover (RFH); (2) non-COLH (NCOLH); and (3) SMART [19]. Specifically, RFH selects the AP that provides the highest rate in the present, while both NCOLH and SMART select the AP that provides the longest connection time in the future. In NCOLH, each user has its own bandit and performs independent learning. In SMART, all users share a common bandit and perform collective learning.
Figure 4 depicts the comparison of the ANH and ALT obtained under different schemes by varying the number of users from one to nine after 15,000 iterations. We see that the performance of all schemes degrades as the number of users increases due to the ensuing resource competition. More importantly, the “shortsighted” RFH exhibits the worst performance, while COLH exhibits the best one among its “long-sighted” counterparts. Moreover, the gap between COLH and its benchmarks grows. Although all users are of the same service type (because of the same rate requirements [19]), their trajectories are different but somewhat similar. As a result, the increase in the number of users exacerbates the disadvantage of collective learning in SMART and enlarges the advantage of collaborative learning in COLH. Concretely, when nine users are involved, the ALT obtained under COLH is larger than that under NCOLH, SMART, and RFH by 4.3%, 13.4%, and 38.5%, respectively. The performance gains obtained reflect not only the superiority of making “long-sighted” decisions through online learning but also the effectiveness of the collaborative effects incorporated by using feedback similarity.
Figure 4.
Comparison between (a) ANH and (b) ALT with different numbers of users.
Figure 5 depicts the comparison of the ANH and ALT obtained under different schemes by varying the numbers of obstacles from 10 to 30 after 15,000 iterations. With the number of obstacles increases, the ANH increases and the ALT decreases no matter the scheme. This trend is consistent with the intuition that more handovers occur in complex environments with many obstacles than in simple environments with few obstacles. Furthermore, COLH always outperforms the benchmarks, especially in complex environments.
Figure 5.
Comparison between (a) ANH and (b) ALT with different numbers of obstacles.
5. Conclusions
In this paper, a novel handover scheme in hybrid VLC/RF networks was proposed, through which the users in the same cluster collaborate to select APs. By sharing feedback, this scheme clusters similar users without involving any user trajectory, which improves its practicality and protects user privacy. On this basis, the feedback similarities across users are utilized for a smarter estimation of expected rewards and further reduction in handovers in the long run. By incorporating the collaborative effects of the user clusters, the proposal has been verified to outperform its benchmarks on reducing handovers.
Author Contributions
Conceptualization, data curation, investigation, methodology, resources, and simulation, S.M. and S.H.; formal analysis and supervision, K.Z.; visualization and writing—original draft, S.M. and X.L.; validation and writing—review and editing, S.M., S.H., Z.X. and J.M. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Basic research funds for universities in the Xinjiang Uygur Autonomous Region in 2024 under Grant No. XJEDU2024P064. This work was also supported by the natural science foundation of Xinjiang Uygur Autonomous Region in 2024 under Grant No. 2024D01A12. This work was also supported by the doctoral research projects of Xinjiang Institute of Engineering under Grant No. 2025XGYBQJ31.
Data Availability Statement
Data are contained within the article.
Conflicts of Interest
Author Jihang Mi was employed by the company Digital Technology Company Limited of Aerospace Science and Industry Corp. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| AP | Access point |
| ANH | Average number of handover |
| ALT | Average lasting time |
| BAPS | Based AP selection |
| COLH | Collaborative online learning-based handover |
| CBA | Clustering of bandits approach |
| CC | Central controller |
| CCI | Co-channel interference |
| FHO | Frequent handovers |
| FBS | Feedback sharing |
| LiFi | Light fidelity |
| MAB | Multi-armed bandit |
| NCOLH | Non-collaborative online learning-based handover |
| VLC/RF | Visible light communication/radio frequency |
| VLC | Visible light communication |
| RF | Radio frequency |
| RWP | Random waypoint mobility |
| RFH | Rate-first handover |
| WiFi | Wireless fidelity |
References
- Arshad, R.; Lampe, L. Stochastic Geometry Analysis of User Mobility in RF/VLC Hybrid Networks. IEEE Trans. Wirel. Commun. 2021, 20, 7404–7419. [Google Scholar] [CrossRef]
- Huang, S.; Chuai, G.; Gao, W. Two-Way Selection Handover Algorithm for Load Balancing in Hybrid VLC-RF Networks. In Proceedings of the 2021 IEEE/CIC International Conference on Communications in China (ICCC), Xiamen, China, 28–30 July 2021; pp. 1065–1070. [Google Scholar]
- Basnayaka, D.A.; Haas, H. Hybrid RF and VLC Systems: Improving User Data Rate Performance of VLC Systems. In Proceedings of the 2015 IEEE 81st Vehicular Technology Conference (VTC Spring), Glasgow, UK, 11–14 May 2015; pp. 1–5. [Google Scholar]
- Sun, L.; Hou, J.; Shu, T. Spatial and Temporal Contextual Multi-Armed Bandit Handovers in Ultra-Dense mmWave Cellular Networks. IEEE Trans. Mob. Comput. 2021, 20, 3423–3438. [Google Scholar] [CrossRef]
- Bao, X.; Okine, A.; Shi, L.; Bao, N.; Adjardjah, W. Channel Adaptive Dwell Timer for Vertical Handoff in Hybrid VLC and Wi-Fi Networks. In Proceedings of the 2018 IEEE/CIC International Conference on Communications in China (ICCC), Beijing, China, 16–18 August 2018; pp. 609–613. [Google Scholar]
- Liu, R.; Zhang, C. Dynamic dwell timer for vertical handover in VLC-WLAN heterogeneous networks. In Proceedings of the 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC), Valencia, Spain, 26–30 June 2017; pp. 1256–1260. [Google Scholar]
- Liang, S.; Zhang, Y.; Fan, B.; Tian, H. Multi-Attribute Vertical Handover Decision-Making Algorithm in a Hybrid VLC-Femto System. IEEE Commun. Lett. 2017, 21, 1521–1524. [Google Scholar] [CrossRef]
- Camporez, H.; Costa, W.; Segatto, M.; Silva, J.; Deters, J.K.; Wörtche, H. AI-Driven Enhancements for Handover in Visible Light Communication Systems. J. Light. Technol. 2024, 42, 8191–8202. [Google Scholar] [CrossRef]
- Wu, X.; O’Brien, D.C.; Deng, X.; Linnartz, J.-P.M.G. Smart Handover for Hybrid LiFi and WiFi Networks. IEEE Trans. Wirel. Commun. 2020, 19, 8211–8219. [Google Scholar] [CrossRef]
- Wu, X.; Haas, H. Handover skipping for LiFi. IEEE Access 2019, 7, 38369–38378. [Google Scholar] [CrossRef]
- Wu, X.; O’Brien, D.C. A Novel Machine Learning-Based Handover Scheme for Hybrid LiFi and WiFi Networks. In Proceedings of the 2020 IEEE Globecom Workshops (GC Wkshps), Taipei, Taiwan, 7–11 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
- Arfaoui, M.A.; Ghrayeb, A.; Assi, C. Cascaded Artificial Neural Networks for Proactive Power Allocation in Indoor LiFi Systems. In Proceedings of the ICC 2021-IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
- Wu, Z.-Y.; Ismail, M.; Serpedin, E.; Wang, J. Efficient Prediction of Link Outage in Mobile Optical Wireless Communications. IEEE Trans. Wirel. Commun. 2021, 20, 882–896. [Google Scholar] [CrossRef]
- Senaratne, H.; Mueller, M.; Behrisch, M.; Lalanne, F.; Bustos-Jiménez, J.; Schneidewind, J.; Keim, D.; Schreck, T. Urban Mobility Analysis With Mobile Network Data: A Visual Analytics Approach. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1537–1546. [Google Scholar] [CrossRef]
- Peng, Y.; Zhou, Y.; Liu, L.; Li, J.; Pan, Z.; Sun, G. Intelligent Recommendation-Based User Plane Handover With Enhanced TCP Throughput in Ultra-Dense Cellular Networks. IEEE Trans. Veh. Technol. 2022, 71, 595–610. [Google Scholar] [CrossRef]
- Fonseca, D.F.; Guzman, B.G.; Martena, G.L.; Bian, R.; Haas, H.; Giustiniano, D. Prediction-model-assisted reinforcement learning algorithm for handover decision-making in hybrid LiFi and WiFi networks. J. Opt. Commun. Netw. 2024, 16, 159–170. [Google Scholar] [CrossRef]
- Ma, G.; Parthiban, R.; Karmakar, N. An artificial neural network-based handover scheme for hybrid LiFi networks. IEEE Access 2022, 10, 130350–130358. [Google Scholar] [CrossRef]
- Gentile, C.; Li, S.; Kar, P.; Karatzoglou, A.; Zappella, G.; Etrue, E. On context-dependent clustering of bandits. Proc. Int. Conf. Mach. Learn. 2017, 70, 1253C1262. [Google Scholar]
- Sun, Y.; Feng, G.; Qin, S.; Liang, Y.-C.; Yum, T.-S.P. The SMART Handoff Policy for Millimeter Wave Heterogeneous Cellular Networks. IEEE Trans. Mob. Comput. 2018, 17, 1456–1468. [Google Scholar] [CrossRef]
- Soltani, M.D.; Arfaoui, M.A.; Tavakkolnia, I.; Ghrayeb, A.; Safari, M.; Assi, C.M.; Hasna, M.O.; Haas, H. Bidirectional Optical Spatial Modulation for Mobile Users: Toward a Practical Design for LiFi Systems. IEEE J. Sel. Areas Commun. 2019, 37, 2069–2086. [Google Scholar] [CrossRef]
- Chen, C.; Basnayaka, D.A.; Purwita, A.A.; Wu, X.; Haas, H. Wireless Infrared-Based LiFi Uplink Transmission With Link Blockage and Random Device Orientation. IEEE Trans. Commun. 2021, 69, 1175–1188. [Google Scholar] [CrossRef]
- Demir, M.S.; Uysal, M. A Cross-Layer Design for Dynamic Resource Management of VLC Networks. IEEE Trans. Commun. 2021, 69, 1858–1867. [Google Scholar] [CrossRef]
- Perahia, E.; Stacey, R. Next Generation Wireless LAN: 802.11n and 802.11ac; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).