Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (108)

Search Parameters:
Keywords = multi-armed bandits

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 677 KB  
Article
Hierarchical MAB Framework for Energy-Aware Beam Training for Near-Field Communications
by Yunxing Xiang, Yi Yan, Yunchao Song, Jing Gao, Xiaohui You, Jun Wang, Huibin Liang and Yixin Jiang
Sensors 2026, 26(1), 60; https://doi.org/10.3390/s26010060 - 21 Dec 2025
Viewed by 247
Abstract
For XL-MIMO multi-user frequency division duplex systems, this paper proposes a near-field beam training scheme using a two-phase combinatorial multi-armed bandit (MAB) framework. This scheme leverages the MAB framework, integrating energy-aware user scheduling and hierarchical beam training to balance communication quality and device [...] Read more.
For XL-MIMO multi-user frequency division duplex systems, this paper proposes a near-field beam training scheme using a two-phase combinatorial multi-armed bandit (MAB) framework. This scheme leverages the MAB framework, integrating energy-aware user scheduling and hierarchical beam training to balance communication quality and device battery level, thereby effectively enhancing system energy efficiency and extending the device’s lifespan. Specifically, in the first phase, we account for user battery levels by designing an energy-aware upper confidence bound (UCB) algorithm for user scheduling. This algorithm effectively balances exploration and exploitation, prioritizing users with higher achievable rates and sufficient battery level. In the second phase, based on the scheduled users, two UCB algorithms are employed for beam training. In the first layer, discrete Fourier transform codebook-based beam scanning is utilized, and a UCB algorithm is applied to initially acquire angle information for scheduled users. In the second layer, based on the obtained angle information, a candidate set of polar-domain codewords is constructed. Another UCB algorithm is then employed to select the optimal polar-domain codewords. The effectiveness of our scheme is confirmed by simulations, demonstrating notable achievable rate gains for multi-user communications. Full article
Show Figures

Figure 1

17 pages, 406 KB  
Article
Spectral Efficiency Beamforming Scheme for UAV MIMO Communication via Budgeted Combinatorial Multi-Armed Bandit
by Jing Gao, Yunxing Xiang, Yunchao Song, Jing Zhu, Jun Wang, Xiaohui You, Ge Wang and Tianbao Gao
Electronics 2025, 14(24), 4805; https://doi.org/10.3390/electronics14244805 - 6 Dec 2025
Viewed by 222
Abstract
Unmanned aerial vehicles (UAVs) equipped with antenna arrays can deliver high-capacity, high-throughput, and low-latency communication services. Considering a UAV-assisted mmWave multi-input and multi-output (MIMO) system, a two-stage beamforming scheme based on a budgeted combinatorial multi-armed bandit (BC-MAB) is proposed to improve the system’s [...] Read more.
Unmanned aerial vehicles (UAVs) equipped with antenna arrays can deliver high-capacity, high-throughput, and low-latency communication services. Considering a UAV-assisted mmWave multi-input and multi-output (MIMO) system, a two-stage beamforming scheme based on a budgeted combinatorial multi-armed bandit (BC-MAB) is proposed to improve the system’s spectral efficiency (SE). The pre-beamformer design problem is initially formulated as a BC-MAB problem. In this framework, the reward is the received energy, while the cost corresponds to the energy consumed by each RF chain and the budget is represented by the residual energy of the UAV. To achieve a favorable trade-off between the number of communication slots and the energy acquired per slot, a pre-beamforming scheme based on the bang-per-buck ratio is introduced to optimize the number of activated RF chains, therefore maximizing the cumulative reward. The second stage utilizes the reduced-dimensional instantaneous channel state information to design and optimize the beamformer to achieve maximum system SE. The proposed scheme achieves more than 7.1% improvement in SE compared to the benchmark schemes. Simulations validate the superiority of the proposed scheme. Full article
Show Figures

Figure 1

15 pages, 1380 KB  
Article
Optimizing LoRaWAN Performance Through Learning Automata-Based Channel Selection
by Luka Aime Atadet, Richard Musabe, Eric Hitimana and Omar Gatera
Future Internet 2025, 17(12), 555; https://doi.org/10.3390/fi17120555 - 2 Dec 2025
Viewed by 266
Abstract
The rising demand for long-range, low-power wireless communication in applications such as monitoring, smart metering, and wide-area sensor networks has emphasized the critical need for efficient spectrum utilization in LoRaWAN (Long Range Wide Area Network). In response to this challenge, this paper proposes [...] Read more.
The rising demand for long-range, low-power wireless communication in applications such as monitoring, smart metering, and wide-area sensor networks has emphasized the critical need for efficient spectrum utilization in LoRaWAN (Long Range Wide Area Network). In response to this challenge, this paper proposes a novel channel selection framework based on Hierarchical Discrete Pursuit Learning Automata (HDPA), aimed at enhancing the adaptability and reliability of LoRaWAN operations in dynamic and interference-prone environments. HDPA leverages a tree-structure reinforcement learning model to monitor and respond to transmission success in real-time, dynamically updating channel probabilities based on environmental feedback. Simulation results conducted in MATLAB R2023b demonstrate that HDPA significantly outperforms conventional algorithms such as Hierarchical Continuous Pursuit Automata (HCPA) in terms of convergence speed, selection accuracy, and throughput performance. Specifically, HDPA achieved 98.78% accuracy with a mean convergence of 6279 iterations, compared to HCPA’s 93.89% accuracy and 6778 iterations in an eight-channel setup. Unlike the Tug-of-War-based Multi-Armed Bandit strategy, which emphasizes fairness in real-world heterogeneous networks, HDPA offers a computationally lightweight and highly adaptive solution tailored to LoRaWAN’s stochastic channel dynamics. These results position HDPA as a promising framework for improving reliability and spectrum utilization in future IoT deployments. Full article
Show Figures

Figure 1

16 pages, 3342 KB  
Article
Geoscientific Input Feature Selection for CNN-Driven Mineral Prospectivity Mapping
by Arya Kimiaghalam, Kyubo Noh and Andrei Swidinsky
Minerals 2025, 15(12), 1237; https://doi.org/10.3390/min15121237 - 23 Nov 2025
Viewed by 402
Abstract
In recent years, machine learning techniques such as convolutional neural networks have been used for mineral prospectivity mapping. Since a diverse range of geoscientific data is often available for training, it is computationally challenging to select a subset of features that optimizes model [...] Read more.
In recent years, machine learning techniques such as convolutional neural networks have been used for mineral prospectivity mapping. Since a diverse range of geoscientific data is often available for training, it is computationally challenging to select a subset of features that optimizes model performance. Our study aims to demonstrate the effect of optimal input feature selection on convolutional neural network model performance in mineral prospectivity mapping applications. We demonstrate results from both exhaustive and algorithmic feature selection methods in the context of copper porphyry prospectivity modeling and analyze the performance and stability of optimally trained models. Using the QUEST dataset from central interior British Columbia, such a feature selection technique improves model performance by 6.8% over models that use all available features, yet consumes around 2.2% of the computational resources needed to exhaustively search for the optimal feature subset. Full article
(This article belongs to the Special Issue Feature Papers in Mineral Exploration Methods and Applications 2025)
Show Figures

Figure 1

23 pages, 1841 KB  
Article
Population-Level Analysis of Personalized Food Recommendation Using Reinforcement Learning
by Yone Tellechea, Markel Arrojo, Ander Cejudo and Cristina Martin
Foods 2025, 14(21), 3770; https://doi.org/10.3390/foods14213770 - 3 Nov 2025
Viewed by 1031
Abstract
This paper introduces an innovative methodology for optimizing recommendation strategies across different populations within the food industry. While previous approaches to recommending courses have overlooked cultural and age-based preferences, our work demonstrates how understanding these differences can significantly enhance the attractiveness for consumers [...] Read more.
This paper introduces an innovative methodology for optimizing recommendation strategies across different populations within the food industry. While previous approaches to recommending courses have overlooked cultural and age-based preferences, our work demonstrates how understanding these differences can significantly enhance the attractiveness for consumers and create new opportunities for marketing. By simulating diverse populations using a fuzzy logic approach, based on individual characteristics such as age, gender, geographical area, and city size, the study evaluates how recommendation algorithms perform within a generated menu database. Results show that algorithms like State–Action–Reward–State–Action (SARSA), multi-armed bandit (MAB), and Deep-Q Network (DQN) exhibit varying levels of efficiency depending on the population. Notably, the DQN improves accumulated reward over a random recommender by 71.60% for “Foodies”, 65.02% for “Veggies”, 63.46% for “Spanish”, and 8.89% for “Seniors”, while MAB achieves similar performance with fewer resources. Statistically significant differences (p < 0.005) are found in the performance of the DQN between populations, with large effect sizes according to Cliff’s delta. These findings highlight recommender systems as an opportunity to navigate market demand, optimize supply chains, and reduce food waste. A better understanding of public preferences enables more effective alignment of supply and demand across the entire food supply chain. As a conclusion, while the DQN effectively captures target group preferences, the optimum recommendation strategy should be chosen by balancing algorithmic performance, computational efficiency, and the specific requirements of the food sector. Full article
(This article belongs to the Special Issue Artificial Intelligence for the Food Industry)
Show Figures

Figure 1

35 pages, 10688 KB  
Article
Multi-Armed Bandit Optimization for Explainable AI Models in Chronic Kidney Disease Risk Evaluation
by Jianbo Huang, Long Li and Jia Chen
Symmetry 2025, 17(11), 1808; https://doi.org/10.3390/sym17111808 - 27 Oct 2025
Viewed by 691
Abstract
Chronic kidney disease (CKD) impacts over 850 million people globally, representing a critical public health issue, yet existing risk assessment methodologies inadequately address the complexity of disease progression trajectories. Traditional machine learning approaches encounter critical limitations including inefficient hyperparameter selection and lack of [...] Read more.
Chronic kidney disease (CKD) impacts over 850 million people globally, representing a critical public health issue, yet existing risk assessment methodologies inadequately address the complexity of disease progression trajectories. Traditional machine learning approaches encounter critical limitations including inefficient hyperparameter selection and lack of clinical transparency, hindering their deployment in healthcare settings. This study introduces an innovative computational framework that integrates adaptive Multi-Armed Bandit (MAB) strategies with BorderlineSMOTE sampling techniques to improve CKD risk assessment. The proposed methodology leverages XGBoost within an ensemble learning paradigm enhanced by Upper Confidence Bound exploration strategy, coupled with a comprehensive interpretability system incorporating SHAP and LIME analytical tools to ensure model transparency. To address the challenge of algorithmic interpretability while maintaining clinical utility, a four-level risk categorization framework was developed, employing cross-validated stratification methods and balanced performance evaluation metrics, thereby ensuring fair predictive accuracy across diverse patient populations and minimizing bias toward dominant risk categories. Through rigorous empirical evaluation on clinical datasets, we performed extensive comparative analysis against sixteen established algorithms using paired statistical testing with Bonferroni correction. The MAB-optimized framework achieved superior predictive performance with accuracy of 91.8%, F1-score of 91.0%, and ROC-AUC of 97.8%, demonstrating superior performance within the evaluated cohort of reference algorithms (p-value < 0.001). Remarkably, our optimized framework delivered nearly ten-fold computational efficiency gains relative to conventional grid search methods while preserving robust classification performance. Feature importance analysis identified albumin-to-creatinine ratio, eGFR measurements, and CKD staging as dominant prognostic factors, demonstrating concordance with established clinical nephrology practice. This research addresses three core limitations in healthcare artificial intelligence: optimization computational cost, model interpretability, and consistent performance across heterogeneous clinical populations, offering a practical solution for improved CKD risk stratification in clinical practice. Full article
Show Figures

Figure 1

18 pages, 3038 KB  
Article
A Multi-Objective Metaheuristic and Multi-Armed Bandit Hybrid-Based Multi-Corridor Coupled TTC Calculation Method
by Zengjie Sun, Wenle Song, Lei Wang and Jiahao Zhang
Electronics 2025, 14(20), 4075; https://doi.org/10.3390/electronics14204075 - 16 Oct 2025
Viewed by 383
Abstract
The calculation of Total Transfer Capability (TTC) for transmission corridors serves as the foundation for security region determination and electricity market transactions. However, existing TTC methods often neglect corridor correlations, leading to overly optimistic results. TTC computation involves complex stability verification and requires [...] Read more.
The calculation of Total Transfer Capability (TTC) for transmission corridors serves as the foundation for security region determination and electricity market transactions. However, existing TTC methods often neglect corridor correlations, leading to overly optimistic results. TTC computation involves complex stability verification and requires enumerating numerous renewable energy operation scenarios to establish security boundaries, exhibiting high non-convexity and nonlinearity that challenge gradient-based iterative algorithms in approaching global optima. Furthermore, practical power systems feature coupled corridor effects, transforming multi-corridor TTC into a complex Pareto frontier search problem. This paper proposes a MOEA/D-FRRMAB (Fitness–Rate–Reward Multi-Armed Bandit)-based method featuring: (1) a TTC model incorporating transient angle stability constraints, steady-state operational limits, and inter-corridor power interactions and (2) a decomposition strategy converting the multi-objective problem into subproblems, enhanced by MOEA/D-FRRMAB for improved Pareto front convergence and diversity. IEEE 39-bus tests demonstrate superior solution accuracy and diversity, providing dispatch centers with more reliable multi-corridor TTC strategies. Full article
Show Figures

Figure 1

25 pages, 10974 KB  
Article
Balancing Validity and Vulnerability: Knowledge-Driven Seed Generation via LLMs for Deep Learning Library Fuzzing
by Rongtao Liao, Xuehu Yan, Zeshan Pang and Kailong Zhu
Appl. Sci. 2025, 15(19), 10396; https://doi.org/10.3390/app151910396 - 25 Sep 2025
Viewed by 985
Abstract
Fuzzing deep learning (DL) libraries is essential for uncovering security vulnerabilities in AI systems. Existing approaches enhance large language models (LLMs) with external knowledge such as bug reports to improve the quality of generated seeds. However, most approaches still rely on static strategies [...] Read more.
Fuzzing deep learning (DL) libraries is essential for uncovering security vulnerabilities in AI systems. Existing approaches enhance large language models (LLMs) with external knowledge such as bug reports to improve the quality of generated seeds. However, most approaches still rely on static strategies or single knowledge sources, limiting their ability to produce syntactically valid inputs that also expose deeper bugs. To address this challenge, we propose an adaptive seed generation approach that models knowledge-guided prompt selection as a multi-armed bandit problem. Our method first constructs two knowledge bases from API documentation and bug reports, then dynamically selects and refines prompt strategies based on real-time feedback. These strategies are tailored to the knowledge types in the respective bases. We design a multi-dimensional reward function to evaluate each batch of generated seeds by measuring their error-triggering potential and behavioral diversity, enabling a balanced exploration of both syntactically valid and bug-triggering test cases. Our experiments on three DL libraries, PaddlePaddle, MindSpore, and OneFlow, identify 17 previously unknown crash bugs, demonstrating the effectiveness and generalizability of the proposed approach. Full article
Show Figures

Figure 1

36 pages, 6309 KB  
Article
Utilization of Upper Confidence Bound Algorithms for Effective Subproblem Selection in Cooperative Coevolution Frameworks
by Kyung-Soo Kim
Mathematics 2025, 13(18), 3052; https://doi.org/10.3390/math13183052 - 22 Sep 2025
Viewed by 466
Abstract
In cooperative coevolution (CC) frameworks, it is essential to identify the subproblems that can significantly contribute to finding the optimal solutions of the objective function. In traditional CC frameworks, subproblems are selected either sequentially or based on the degree of improvement in the [...] Read more.
In cooperative coevolution (CC) frameworks, it is essential to identify the subproblems that can significantly contribute to finding the optimal solutions of the objective function. In traditional CC frameworks, subproblems are selected either sequentially or based on the degree of improvement in the fitness of the optimal solution. However, these classical methods have limitations in balancing between exploration and exploitation when selecting the subproblems. To overcome these weaknesses, we propose upper confidence bound (UCB)-based new subproblem selection methods for the CC frameworks. Our proposed methods utilize UCB algorithms to strike a balance between exploration and exploitation in subproblem selection, while also incorporating a non-stationary mechanism to account for the convergence of evolutionary algorithms. These strategies possess novel characteristics that distinguish our methods from existing approaches. In comprehensive experiments, the CC frameworks using our proposed subproblem selectors achieved remarkable optimization results when solving most benchmark functions comprised of 1000 interdependent variables. Thus, we found that our UCB-based subproblem selectors can significantly contribute to searching for optimal solutions in CC frameworks by elaborately balancing exploration and exploitation when selecting subproblems. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

27 pages, 520 KB  
Article
QiMARL: Quantum-Inspired Multi-Agent Reinforcement Learning Strategy for Efficient Resource Energy Distribution in Nodal Power Stations
by Sapthak Mohajon Turjya, Anjan Bandyopadhyay, M. Shamim Kaiser and Kanad Ray
AI 2025, 6(9), 209; https://doi.org/10.3390/ai6090209 - 1 Sep 2025
Cited by 1 | Viewed by 2565
Abstract
The coupling of quantum computing with multi-agent reinforcement learning (MARL) provides an exciting direction to tackle intricate decision-making tasks in high-dimensional spaces. This work introduces a new quantum-inspired multi-agent reinforcement learning (QiMARL) model, utilizing quantum parallelism to achieve learning efficiency and scalability improvement. [...] Read more.
The coupling of quantum computing with multi-agent reinforcement learning (MARL) provides an exciting direction to tackle intricate decision-making tasks in high-dimensional spaces. This work introduces a new quantum-inspired multi-agent reinforcement learning (QiMARL) model, utilizing quantum parallelism to achieve learning efficiency and scalability improvement. The QiMARL model is tested on an energy distribution task, which optimizes power distribution between generating and demanding nodal power stations. We compare the convergence time, reward performance, and scalability of QiMARL with traditional Multi-Armed Bandit (MAB) and Multi-Agent Reinforcement Learning methods, such as Greedy, Upper Confidence Bound (UCB), Thompson Sampling, MADDPG, QMIX, and PPO methods with a comprehensive ablation study. Our findings show that QiMARL yields better performance in high-dimensional systems, decreasing the number of training epochs needed for convergence while enhancing overall reward maximization. We also compare the algorithm’s computational complexity, indicating that QiMARL is more scalable to high-dimensional quantum environments. This research opens the door to future studies of quantum-enhanced reinforcement learning (RL) with potential applications to energy optimization, traffic management, and other multi-agent coordination problems. Full article
(This article belongs to the Special Issue Advances in Quantum Computing and Quantum Machine Learning)
Show Figures

Figure 1

22 pages, 2972 KB  
Article
Cooperative Schemes for Joint Latency and Energy Consumption Minimization in UAV-MEC Networks
by Ming Cheng, Saifei He, Yijin Pan, Min Lin and Wei-Ping Zhu
Sensors 2025, 25(17), 5234; https://doi.org/10.3390/s25175234 - 22 Aug 2025
Viewed by 1363
Abstract
The Internet of Things (IoT) has promoted emerging applications that require massive device collaboration, heavy computation, and stringent latency. Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) systems can provide flexible services for user devices (UDs) with wide coverage. The optimization of both [...] Read more.
The Internet of Things (IoT) has promoted emerging applications that require massive device collaboration, heavy computation, and stringent latency. Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) systems can provide flexible services for user devices (UDs) with wide coverage. The optimization of both latency and energy consumption remains a critical yet challenging task due to the inherent trade-off between them. Joint association, offloading, and computing resource allocation are essential to achieving satisfying system performance. However, these processes are difficult due to the highly dynamic environment and the exponentially increasing complexity of large-scale networks. To address these challenges, we introduce a carefully designed cost function to balance the latency and the energy consumption, formulate the joint problem into a partially observable Markov decision process, and propose two multi-agent deep-reinforcement-learning-based schemes to tackle the long-term problem. Specifically, the multi-agent proximal policy optimization (MAPPO)-based scheme uses centralized learning and decentralized execution, while the closed-form enhanced multi-armed bandit (CF-MAB)-based scheme decouples association from offloading and computing resource allocation. In both schemes, UDs act as independent agents that learn from environmental interactions and historic decisions, make decision to maximize its individual reward function, and achieve implicit collaboration through the reward mechanism. The numerical results validate the effectiveness and show the superiority of our proposed schemes. The MAPPO-based scheme enables collaborative agent decisions for high performance in complex dynamic environments, while the CF-MAB-based scheme supports independent rapid response decisions. Full article
Show Figures

Figure 1

27 pages, 6520 KB  
Article
Enhancing Online Statistical Decision-Making in Maritime C2 Systems: A Resilience Analysis of the LORD Procedure Under Adversarial Data Perturbations
by Victor Benicio Ardilha da Allen Alves, Gabriel Custódio Rangel, Miguel Ângelo Lellis Moreira, Igor Pinheiro de Araújo Costa, Carlos Francisco Simões Gomes and Marcos dos Santos
J. Mar. Sci. Eng. 2025, 13(8), 1547; https://doi.org/10.3390/jmse13081547 - 12 Aug 2025
Viewed by 742
Abstract
Real-time statistical inference plays a pivotal role in maritime Command and Control (C2) environments, particularly for applications such as satellite-based object detection and underwater signal interpretation. These contexts often require online multiple hypothesis testing mechanisms capable of sequential decision-making while preserving statistical rigor. [...] Read more.
Real-time statistical inference plays a pivotal role in maritime Command and Control (C2) environments, particularly for applications such as satellite-based object detection and underwater signal interpretation. These contexts often require online multiple hypothesis testing mechanisms capable of sequential decision-making while preserving statistical rigor. A primary concern is the control of the False Discovery Rate (FDR), as erroneous detections can impair operational effectiveness. In this study, we investigate the robustness of the Levels based On Recent Discovery (LORD) algorithm under adversarial conditions by introducing controlled perturbations to the data stream—specifically, missing or corrupted p-values derived from simulated Gaussian distributions. Inspired by developments in corruption-aware multi-armed bandit models, we formulate adversarial scenarios and propose defense strategies that modify the LORD algorithm’s threshold sequence and integrate an online Benjamini–Hochberg procedure. The results, based on extensive Monte Carlo simulations, demonstrate that even a single missing p-value can trigger a cascading effect that reduces statistical power, and that our proposed mitigation strategies significantly improve algorithmic resilience while maintaining FDR control. These contributions advance the development of robust online statistical decision-making tools for real-time maritime surveillance systems operating under uncertain and error-prone conditions. Full article
(This article belongs to the Special Issue Dynamics and Control of Marine Mechatronics)
Show Figures

Figure 1

18 pages, 484 KB  
Article
LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models
by Jong-Min Kim
Mathematics 2025, 13(15), 2523; https://doi.org/10.3390/math13152523 - 6 Aug 2025
Viewed by 2660
Abstract
Contextual multi-armed bandits (CMABs) are vital for sequential decision-making in areas such as recommendation systems, clinical trials, and finance. We propose a simulation framework integrating Gaussian Process (GP)-based CMABs with vine copulas to model dependent contexts and GARCH processes to capture reward volatility. [...] Read more.
Contextual multi-armed bandits (CMABs) are vital for sequential decision-making in areas such as recommendation systems, clinical trials, and finance. We propose a simulation framework integrating Gaussian Process (GP)-based CMABs with vine copulas to model dependent contexts and GARCH processes to capture reward volatility. Rewards are generated via copula-transformed Beta distributions to reflect complex joint dependencies and skewness. We evaluate four policies—ensemble, Epsilon-greedy, Thompson, and Upper Confidence Bound (UCB)—over 10,000 replications, assessing cumulative regret, observed reward, and cumulative reward. While Thompson sampling and LLM-guided policies consistently minimize regret and maximize rewards under varied reward distributions, Epsilon-greedy shows instability, and UCB exhibits moderate performance. Enhancing the ensemble with copula features, GP models, and dynamic policy selection driven by a large language model (LLM) yields superior adaptability and performance. Our results highlight the effectiveness of combining structured probabilistic models with LLM-based guidance for robust, adaptive decision-making in skewed, high-variance environments. Full article
(This article belongs to the Special Issue Privacy-Preserving Machine Learning in Large Language Models (LLMs))
Show Figures

Figure 1

16 pages, 2246 KB  
Article
Context-Aware Beam Selection for IRS-Assisted mmWave V2I Communications
by Ricardo Suarez del Valle, Abdulkadir Kose and Haeyoung Lee
Sensors 2025, 25(13), 3924; https://doi.org/10.3390/s25133924 - 24 Jun 2025
Cited by 1 | Viewed by 1050
Abstract
Millimeter wave (mmWave) technology, with its ultra-high bandwidth and low latency, holds significant promise for vehicle-to-everything (V2X) communications. However, it faces challenges such as high propagation losses and limited coverage in dense urban vehicular environments. Intelligent Reflecting Surfaces (IRSs) help address these issues [...] Read more.
Millimeter wave (mmWave) technology, with its ultra-high bandwidth and low latency, holds significant promise for vehicle-to-everything (V2X) communications. However, it faces challenges such as high propagation losses and limited coverage in dense urban vehicular environments. Intelligent Reflecting Surfaces (IRSs) help address these issues by enhancing mmWave signal paths around obstacles, thereby maintaining reliable communication. This paper introduces a novel Contextual Multi-Armed Bandit (C-MAB) algorithm designed to dynamically adapt beam and IRS selections based on real-time environmental context. Simulation results demonstrate that the proposed C-MAB approach significantly improves link stability, doubling average beam sojourn times compared to traditional SNR-based strategies and standard MAB methods, and achieving gains of up to four times the performance in scenarios with IRS assistance. This approach enables optimized resource allocation and significantly improves coverage, data rate, and resource utilization compared to conventional methods. Full article
Show Figures

Figure 1

18 pages, 803 KB  
Article
Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits
by Jong-Min Kim
Mathematics 2025, 13(13), 2058; https://doi.org/10.3390/math13132058 - 21 Jun 2025
Cited by 1 | Viewed by 1072
Abstract
We propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gaussian and vine copulas to capture nonlinear [...] Read more.
We propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gaussian and vine copulas to capture nonlinear dependencies, while arm-specific reward functions are modeled via GP regression with Beta-distributed targets. We evaluate three widely used bandit policies—Thompson Sampling (TS), ε-Greedy, and Upper Confidence Bound (UCB)—on simulated environments informed by real-world datasets, including Boston Housing and Wine Quality. The Boston Housing dataset exemplifies heterogeneous decision boundaries relevant to housing-related marketing, while the Wine Quality dataset introduces sensory feature-based arm differentiation. Our empirical results indicate that the ε-Greedy policy consistently achieves the highest cumulative reward and lowest regret across multiple runs, outperforming both GP-based TS and UCB in high-dimensional, copula-structured contexts. These findings suggest that combining copula theory with GP modeling provides a robust and flexible foundation for data-driven sequential experimentation in domains characterized by complex contextual dependencies. Full article
Show Figures

Figure 1

Back to TopTop