MDPI - Publisher of Open Access Journals

16 pages, 2246 KiB

Open AccessArticle

Context-Aware Beam Selection for IRS-Assisted mmWave V2I Communications

by Ricardo Suarez del Valle, Abdulkadir Kose and Haeyoung Lee

Sensors 2025, 25(13), 3924; https://doi.org/10.3390/s25133924 - 24 Jun 2025

Viewed by 521

Millimeter wave (mmWave) technology, with its ultra-high bandwidth and low latency, holds significant promise for vehicle-to-everything (V2X) communications. However, it faces challenges such as high propagation losses and limited coverage in dense urban vehicular environments. Intelligent Reflecting Surfaces (IRSs) help address these issues [...] Read more.

Millimeter wave (mmWave) technology, with its ultra-high bandwidth and low latency, holds significant promise for vehicle-to-everything (V2X) communications. However, it faces challenges such as high propagation losses and limited coverage in dense urban vehicular environments. Intelligent Reflecting Surfaces (IRSs) help address these issues by enhancing mmWave signal paths around obstacles, thereby maintaining reliable communication. This paper introduces a novel Contextual Multi-Armed Bandit (C-MAB) algorithm designed to dynamically adapt beam and IRS selections based on real-time environmental context. Simulation results demonstrate that the proposed C-MAB approach significantly improves link stability, doubling average beam sojourn times compared to traditional SNR-based strategies and standard MAB methods, and achieving gains of up to four times the performance in scenarios with IRS assistance. This approach enables optimized resource allocation and significantly improves coverage, data rate, and resource utilization compared to conventional methods. Full article

(This article belongs to the Topic Machine Learning in Communication Systems and Networks, 2nd Edition)

► Show Figures

Figure 1

20 pages, 896 KiB

Open AccessArticle

MAB-Based Online Client Scheduling for Decentralized Federated Learning in the IoT

by Zhenning Chen, Xinyu Zhang, Siyang Wang and Youren Wang

Entropy 2025, 27(4), 439; https://doi.org/10.3390/e27040439 - 18 Apr 2025

Viewed by 425

Abstract

Different from conventional federated learning (FL), which relies on a central server for model aggregation, decentralized FL (DFL) exchanges models among edge servers, thus improving the robustness and scalability. When deploying DFL into the Internet of Things (IoT), limited wireless resources cannot provide [...] Read more.

Different from conventional federated learning (FL), which relies on a central server for model aggregation, decentralized FL (DFL) exchanges models among edge servers, thus improving the robustness and scalability. When deploying DFL into the Internet of Things (IoT), limited wireless resources cannot provide simultaneous access to massive devices. One must perform client scheduling to balance the convergence rate and model accuracy. However, the heterogeneity of computing and communication resources across client devices, combined with the time-varying nature of wireless channels, makes it challenging to estimate accurately the delay associated with client participation during the scheduling process. To address this issue, we investigate the client scheduling and resource optimization problem in DFL without prior client information. Specifically, the considered problem is reformulated as a multi-armed bandit (MAB) program, and an online learning algorithm that utilizes contextual multi-arm slot machines for client delay estimation and scheduling is proposed. Through theoretical analysis, this algorithm can achieve asymptotic optimal performance in theory. The experimental results show that the algorithm can make asymptotic optimal client selection decisions, and this method is superior to existing algorithms in reducing the cumulative delay of the system. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

11 pages, 1820 KiB

Open AccessArticle

Collaborative Online Learning-Based Distributed Handover Scheme in Hybrid VLC/RF 5G Systems

by Saidiwaerdi Maimaiti, Shuman Huang, Kaisa Zhang, Xuewen Liu, Zhiwei Xu and Jihang Mi

Electronics 2025, 14(6), 1142; https://doi.org/10.3390/electronics14061142 - 14 Mar 2025

Cited by 1 | Viewed by 470

Abstract

This paper investigates handover in hybrid visible light communication (VLC)/radio frequency (RF) networks. In such a network, mobile users are prone to experience frequent handovers (FHOs). To this end, we propose a collaborative online learning-based handover scheme (COLH) in hybrid VLC/RF 5G systems. [...] Read more.

This paper investigates handover in hybrid visible light communication (VLC)/radio frequency (RF) networks. In such a network, mobile users are prone to experience frequent handovers (FHOs). To this end, we propose a collaborative online learning-based handover scheme (COLH) in hybrid VLC/RF 5G systems. By selecting the next access point (AP) to which a user should handover, our goal is to make the user–AP connection as long as possible after the handover, defined as a reward that is learned online through a multi-armed bandit (MAB) framework. Unlike previous schemes based on independent and collective learning, first, our scheme dynamically clusters users with similar feedback on a given AP. Second, the users in the same cluster collaborate in estimating the expected reward for that AP, and the one with the maximum expected reward is selected as the next AP. This scheme can be implemented without extensive offline training and location information; thus, its practicality is greatly enhanced. The simulation results show that the proposal outperforms existing benchmarks on reducing handovers. Full article

(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)

► Show Figures

Figure 1

53 pages, 1295 KiB

Open AccessReview

Selective Reviews of Bandit Problems in AI via a Statistical View

by Pengjie Zhou, Haoyu Wei and Huiming Zhang

Mathematics 2025, 13(4), 665; https://doi.org/10.3390/math13040665 - 18 Feb 2025

Cited by 3 | Viewed by 798

Abstract

Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review [...] Read more.

Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration–exploitation trade-offs. Additionally, we explore K-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field. Full article

(This article belongs to the Special Issue Advances in Statistical AI and Causal Inference)

► Show Figures

Figure 1

22 pages, 1271 KiB

Open AccessArticle

Modified Index Policies for Multi-Armed Bandits with Network-like Markovian Dependencies

by Abdalaziz Sawwan and Jie Wu

Network 2025, 5(1), 3; https://doi.org/10.3390/network5010003 - 29 Jan 2025

Viewed by 923

Abstract

Sequential decision-making in dynamic and interconnected environments is a cornerstone of numerous applications, ranging from communication networks and finance to distributed blockchain systems and IoT frameworks. The multi-armed bandit (MAB) problem is a fundamental model in this domain that traditionally assumes independent and [...] Read more.

Sequential decision-making in dynamic and interconnected environments is a cornerstone of numerous applications, ranging from communication networks and finance to distributed blockchain systems and IoT frameworks. The multi-armed bandit (MAB) problem is a fundamental model in this domain that traditionally assumes independent and identically distributed (iid) rewards, which limits its effectiveness in capturing the inherent dependencies and state dynamics present in some real-world scenarios. In this paper, we lay a theoretical framework for a modified MAB model in which each arm’s reward is generated by a hidden Markov process. In our model, each arm undergoes Markov state transitions independent of play in a way that results in varying reward distributions and heightened uncertainty in reward observations. The number of states for each arm can be up to three states. A key challenge arises from the fact that the underlying states governing each arm’s rewards remain hidden at the time of selection. To address this, we adapt traditional index-based policies and develop a modified index approach tailored to accommodate Markovian transitions and enhance selection efficiency for our model. Our proposed proposed Markovian Upper Confidence Bound (MC-UCB) policy achieves logarithmic regret. Comparative analysis with the classical UCB algorithm reveals that MC-UCB consistently achieves approximately a 15% reduction in cumulative regret. This work provides significant theoretical insights and lays a robust foundation for future research aimed at optimizing decision-making processes in complex, networked systems with hidden state dependencies. Full article

► Show Figures

Figure 1

25 pages, 974 KiB

Open AccessArticle

Thompson Sampling for Non-Stationary Bandit Problems

by Han Qi, Fei Guo and Li Zhu

Entropy 2025, 27(1), 51; https://doi.org/10.3390/e27010051 - 9 Jan 2025

Viewed by 1698

Abstract

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there [...] Read more.

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there is currently no regret bound analysis for TS with uninformative priors. To address this, we propose two algorithms, discounted TS and sliding-window TS, designed for sub-Gaussian reward distributions. For these algorithms, we establish an upper bound for the expected regret by bounding the expected number of times a suboptimal arm is played. We show that the regret upper bounds of both algorithms are

\tilde{O} (\sqrt{T B_{T}})

, where T is the time horizon and

B_{T}

is the number of breakpoints. This upper bound matches the lower bound for abruptly changing problems up to a logarithmic factor. Empirical comparisons with other non-stationary bandit algorithms highlight the competitive performance of our proposed methods. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

20 pages, 351 KiB

Open AccessFeature PaperArticle

Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees

by Ali Baheri

Mathematics 2025, 13(1), 149; https://doi.org/10.3390/math13010149 - 3 Jan 2025

Cited by 2 | Viewed by 1490

Abstract

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when [...] Read more.

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction. Full article

► Show Figures

Figure 1

27 pages, 2484 KiB

Open AccessArticle

Secure Dynamic Scheduling for Federated Learning in Underwater Wireless IoT Networks

by Lei Yan, Lei Wang, Guanjun Li, Jingwei Shao and Zhixin Xia

J. Mar. Sci. Eng. 2024, 12(9), 1656; https://doi.org/10.3390/jmse12091656 - 16 Sep 2024

Cited by 2 | Viewed by 1378

Abstract

Federated learning (FL) is a distributed machine learning approach that can enable Internet of Things (IoT) edge devices to collaboratively learn a machine learning model without explicitly sharing local data in order to achieve data clustering, prediction, and classification in networks. In previous [...] Read more.

Federated learning (FL) is a distributed machine learning approach that can enable Internet of Things (IoT) edge devices to collaboratively learn a machine learning model without explicitly sharing local data in order to achieve data clustering, prediction, and classification in networks. In previous works, some online multi-armed bandit (MAB)-based FL frameworks were proposed to enable dynamic client scheduling for improving the efficiency of FL in underwater wireless IoT networks. However, the security of online dynamic scheduling, which is especially essential for underwater wireless IoT, is increasingly being questioned. In this work, we study secure dynamic scheduling for FL frameworks that can protect against malicious clients in underwater FL-assisted wireless IoT networks. Specifically, in order to jointly optimize the communication efficiency and security of FL, we employ MAB-based methods and propose upper-confidence-bound-based smart contracts (UCB-SCs) and upper-confidence-bound-based smart contracts with a security prediction model (UCB-SCPs) to address the optimal scheduling scheme over time-varying underwater channels. Then, we give the upper bounds of the expected performance regret of the UCB-SC policy and the UCB-SCP policy; these upper bounds imply that the regret of the two proposed policies grows logarithmically over communication rounds under certain conditions. Our experiment shows that the proposed UCB-SC and UCB-SCP approaches significantly improve the efficiency and security of FL frameworks in underwater wireless IoT networks. Full article

(This article belongs to the Special Issue Underwater Wireless Communications: Recent Advances and Challenges)

► Show Figures

Figure 1

16 pages, 2850 KiB

Open AccessArticle

Multi-Armed Bandit-Based User Network Node Selection

by Qinyan Gao and Zhidong Xie

Sensors 2024, 24(13), 4104; https://doi.org/10.3390/s24134104 - 24 Jun 2024

Cited by 1 | Viewed by 1181

Abstract

In the scenario of an integrated space–air–ground emergency communication network, users encounter the challenge of rapidly identifying the optimal network node amidst the uncertainty and stochastic fluctuations of network states. This study introduces a Multi-Armed Bandit (MAB) model and proposes an optimization algorithm [...] Read more.

In the scenario of an integrated space–air–ground emergency communication network, users encounter the challenge of rapidly identifying the optimal network node amidst the uncertainty and stochastic fluctuations of network states. This study introduces a Multi-Armed Bandit (MAB) model and proposes an optimization algorithm leveraging dynamic variance sampling (DVS). The algorithm posits that the prior distribution of each node’s network state conforms to a normal distribution, and by constructing the distribution’s expected value and variance, it maximizes the utilization of sample data, thereby maintaining an equilibrium between data exploitation and the exploration of the unknown. Theoretical substantiation is provided to illustrate that the Bayesian regret associated with the algorithm exhibits sublinear growth. Empirical simulations corroborate that the algorithm in question outperforms traditional ε-greedy, Upper Confidence Bound (UCB), and Thompson sampling algorithms in terms of higher cumulative rewards, diminished total regret, accelerated convergence rates, and enhanced system throughput. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

14 pages, 4717 KiB

Open AccessEditor’s ChoiceArticle

Exploring Multi-Armed Bandit (MAB) as an AI Tool for Optimising GMA-WAAM Path Planning

by Rafael Pereira Ferreira, Emil Schubert and Américo Scotti

J. Manuf. Mater. Process. 2024, 8(3), 99; https://doi.org/10.3390/jmmp8030099 - 15 May 2024

Cited by 2 | Viewed by 2244

Abstract

Conventional path-planning strategies for GMA-WAAM may encounter challenges related to geometrical features when printing complex-shaped builds. One alternative to mitigate geometry-related flaws is to use algorithms that optimise trajectory choices—for instance, using heuristics to find the most efficient trajectory. The algorithm can assess [...] Read more.

Conventional path-planning strategies for GMA-WAAM may encounter challenges related to geometrical features when printing complex-shaped builds. One alternative to mitigate geometry-related flaws is to use algorithms that optimise trajectory choices—for instance, using heuristics to find the most efficient trajectory. The algorithm can assess several trajectory strategies, such as contour, zigzag, raster, and even space-filling, to search for the best strategy according to the case. However, handling complex geometries by this means poses computational efficiency concerns. This research aimed to explore the potential of machine learning techniques as a solution to increase the computational efficiency of such algorithms. First, reinforcement learning (RL) concepts are introduced and compared with supervised machining learning concepts. The Multi-Armed Bandit (MAB) problem is explained and justified as a choice within the RL techniques. As a case study, a space-filling strategy was chosen to have this machining learning optimisation artifice in its algorithm for GMA-AM printing. Computational and experimental validations were conducted, demonstrating that adding MAB in the algorithm helped to achieve shorter trajectories, using fewer iterations than the original algorithm, potentially reducing printing time. These findings position the RL techniques, particularly MAB, as a promising machining learning solution to address setbacks in the space-filling strategy applied. Full article

(This article belongs to the Special Issue Advances in Directed Energy Deposition Additive Manufacturing)

► Show Figures

Figure 1

18 pages, 1434 KiB

Open AccessArticle

Dynamic Grouping within Minimax Optimal Strategy for Stochastic Multi-ArmedBandits in Reinforcement Learning Recommendation

by Jiamei Feng, Junlong Zhu, Xuhui Zhao and Zhihang Ji

Appl. Sci. 2024, 14(8), 3441; https://doi.org/10.3390/app14083441 - 18 Apr 2024

Cited by 1 | Viewed by 1245

Abstract

The multi-armed bandit (MAB) problem is a typical problem of exploration and exploitation. As a classical MAB problem, the stochastic multi-armed bandit (SMAB) is the basis of reinforcement learning recommendation. However, most existing SMAB and MAB algorithms have two limitations: (1) they do [...] Read more.

The multi-armed bandit (MAB) problem is a typical problem of exploration and exploitation. As a classical MAB problem, the stochastic multi-armed bandit (SMAB) is the basis of reinforcement learning recommendation. However, most existing SMAB and MAB algorithms have two limitations: (1) they do not make full use of feedback from the environment or agent, such as the number of arms and rewards contained in user feedback; (2) they overlook the utilization of different action selections, which can affect the exploration and exploitation of the algorithm. These limitations motivate us to propose a novel dynamic grouping within the minimax optimal strategy in the stochastic case (DG-MOSS) algorithm for reinforcement learning recommendation for small and medium-sized data scenarios. DG-MOSS does not require additional contextual data and can be used for recommendation of various types of data. Specifically, we designed a new exploration calculation method based on dynamic grouping which uses the feedback information automatically in the selection process and adopts different action selections. During the thorough training of the algorithm, we designed an adaptive episode length to effectively improve the training efficiency. We also analyzed and proved the upper bound of DG-MOSS’s regret. Our experimental results for different scales, densities, and field datasets show that DG-MOSS can yield greater rewards than nine baselines with sufficiently trained recommendation and demonstrate that it has better robustness. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 1365 KiB

Open AccessArticle

A Fairness-Enhanced Federated Learning Scheduling Mechanism for UAV-Assisted Emergency Communication

by Chun Zhu, Ying Shi, Haitao Zhao, Keqi Chen, Tianyu Zhang and Chongyu Bao

Sensors 2024, 24(5), 1599; https://doi.org/10.3390/s24051599 - 29 Feb 2024

Cited by 5 | Viewed by 1869

Abstract

As the frequency of natural disasters increases, the study of emergency communication becomes increasingly important. The use of federated learning (FL) in this scenario can facilitate communication collaboration between devices while protecting privacy, greatly improving system performance. Considering the complex geographic environment, the [...] Read more.

As the frequency of natural disasters increases, the study of emergency communication becomes increasingly important. The use of federated learning (FL) in this scenario can facilitate communication collaboration between devices while protecting privacy, greatly improving system performance. Considering the complex geographic environment, the flexible mobility and large communication radius of unmanned aerial vehicles (UAVs) make them ideal auxiliary devices for wireless communication. Using the UAV as a mobile base station can better provide stable communication signals. However, the number of ground-based IoT terminals is large and closely distributed, so if all of them transmit data to the UAV, the UAV will not be able to take on all of the computation and communication tasks because of its limited energy. In addition, there is competition for spectrum resources among many terrestrial devices, and all devices transmitting data will bring about an extreme shortage of resources, which will lead to the degradation of model performance. This will bring indelible damage to the rescue of the disaster area and greatly threaten the life safety of the vulnerable and injured. Therefore, we use user scheduling to select some terrestrial devices to participate in the FL process. In order to avoid the resource waste generated by the terrestrial device resource prediction, we use the multi-armed bandit (MAB) algorithm for equipment evaluation. Considering the fairness issue of selection, we try to replace the single criterion with multiple criteria, using model freshness and energy consumption weighting as reward functions. The state of the art of our approach is demonstrated by simulations on the datasets. Full article

(This article belongs to the Special Issue Recent Advances in Signal Processing and Wireless Communications Towards 6G)

► Show Figures

Figure 1

30 pages, 2295 KiB

Open AccessArticle

An Integrated GIS-Based Reinforcement Learning Approach for Efficient Prediction of Disease Transmission in Aquaculture

by Aristeidis Karras, Christos Karras, Spyros Sioutas, Christos Makris, George Katselis, Ioannis Hatzilygeroudis, John A. Theodorou and Dimitrios Tsolis

Information 2023, 14(11), 583; https://doi.org/10.3390/info14110583 - 24 Oct 2023

Cited by 7 | Viewed by 4224

Abstract

This study explores the design and capabilities of a Geographic Information System (GIS) incorporated with an expert knowledge system, tailored for tracking and monitoring the spread of dangerous diseases across a collection of fish farms. Specifically targeting the aquacultural regions of Greece, the [...] Read more.

This study explores the design and capabilities of a Geographic Information System (GIS) incorporated with an expert knowledge system, tailored for tracking and monitoring the spread of dangerous diseases across a collection of fish farms. Specifically targeting the aquacultural regions of Greece, the system captures geographical and climatic data pertinent to these farms. A feature of this system is its ability to calculate disease transmission intervals between individual cages and broader fish farm entities, providing crucial insights into the spread dynamics. These data then act as an entry point to our expert system. To enhance the predictive precision, we employed various machine learning strategies, ultimately focusing on a reinforcement learning (RL) environment. This RL framework, enhanced by the Multi-Armed Bandit (MAB) technique, stands out as a powerful mechanism for effectively managing the flow of virus transmissions within farms. Empirical tests highlight the efficiency of the MAB approach, which, in direct comparisons, consistently outperformed other algorithmic options, achieving an impressive accuracy rate of 96%. Looking ahead to future work, we plan to integrate buffer techniques and delve deeper into advanced RL models to enhance our current system. The results set the stage for future research in predictive modeling within aquaculture health management, and we aim to extend our research even further. Full article

(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)

► Show Figures

Figure 1

17 pages, 482 KiB

Open AccessArticle

Traffic Management in IoT Backbone Networks Using GNN and MAB with SDN Orchestration

by Yanmin Guo, Yu Wang, Faheem Khan, Abdullah A. Al-Atawi, Abdulwahid Al Abdulwahid, Youngmoon Lee and Bhaskar Marapelli

Sensors 2023, 23(16), 7091; https://doi.org/10.3390/s23167091 - 10 Aug 2023

Cited by 17 | Viewed by 5000

Abstract

Traffic management is a critical task in software-defined IoT networks (SDN-IoTs) to efficiently manage network resources and ensure Quality of Service (QoS) for end-users. However, traditional traffic management approaches based on queuing theory or static policies may not be effective due to the [...] Read more.

Traffic management is a critical task in software-defined IoT networks (SDN-IoTs) to efficiently manage network resources and ensure Quality of Service (QoS) for end-users. However, traditional traffic management approaches based on queuing theory or static policies may not be effective due to the dynamic and unpredictable nature of network traffic. In this paper, we propose a novel approach that leverages Graph Neural Networks (GNNs) and multi-arm bandit algorithms to dynamically optimize traffic management policies based on real-time network traffic patterns. Specifically, our approach uses a GNN model to learn and predict network traffic patterns and a multi-arm bandit algorithm to optimize traffic management policies based on these predictions. We evaluate the proposed approach on three different datasets, including a simulated corporate network (KDD Cup 1999), a collection of network traffic traces (CAIDA), and a simulated network environment with both normal and malicious traffic (NSL-KDD). The results demonstrate that our approach outperforms other state-of-the-art traffic management methods, achieving higher throughput, lower packet loss, and lower delay, while effectively detecting anomalous traffic patterns. The proposed approach offers a promising solution to traffic management in SDNs, enabling efficient resource management and QoS assurance. Full article

(This article belongs to the Special Issue Advanced Technologies in Sensor Networks and Internet of Things)

► Show Figures

Figure 1

15 pages, 814 KiB

Open AccessArticle

A Multiarmed Bandit Approach for LTE-U/Wi-Fi Coexistence in a Multicell Scenario

by Iago Diógenes do Rego, José M. de Castro Neto, Sildolfo F. G. Neto, Pedro M. de Santana, Vicente A. de Sousa, Dario Vieira and Augusto Venâncio Neto

Sensors 2023, 23(15), 6718; https://doi.org/10.3390/s23156718 - 27 Jul 2023

Cited by 2 | Viewed by 1757

Abstract

Recent studies and literature reviews have shown promising results for 3GPP system solutions in unlicensed bands when coexisting with Wi-Fi, either by using the duty cycle (DC) approach or licensed-assisted access (LAA). However, it is widely known that general performance in these coexistence [...] Read more.

Recent studies and literature reviews have shown promising results for 3GPP system solutions in unlicensed bands when coexisting with Wi-Fi, either by using the duty cycle (DC) approach or licensed-assisted access (LAA). However, it is widely known that general performance in these coexistence scenarios is dependent on traffic and how the duty cycle is adjusted. Most DC solutions configure their parameters statically, which can result in performance losses when the scenario experiences changes on the offered data. In our previous works, we demonstrated that reinforcement learning (RL) techniques can be used to adjust DC parameters. We showed that a Q-learning (QL) solution that adapts the LTE DC ratio to the transmitted data rate can maximize the Wi-Fi/LTE-Unlicensed (LTE-U) aggregated throughput. In this paper, we extend our previous solution by implementing a simpler and more efficient algorithm based on multiarmed bandit (MAB) theory. We evaluate its performance and compare it with the previous one in different traffic scenarios. The results demonstrate that our new solution offers improved balance in throughput, providing similar results for LTE and Wi-Fi, while still showing a substantial system gain. Moreover, in one of the scenarios, our solution outperforms the previous approach by 6% in system throughput. In terms of user throughput, it achieves more than 100% gain for the users at the 10th percentile of performance, while the old solution only achieves a 10% gain. Full article

(This article belongs to the Special Issue Future Radio Wireless Sensor Networks for 5G Networks: Challenges and Opportunities)

► Show Figures

Figure 1

Search Results (39)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (39)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI