Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (6)

Search Parameters:
Keywords = K-armed bandit

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 2171 KiB  
Article
Cost-Efficient Distributed Learning via Combinatorial Multi-Armed Bandits
by Maximilian Egger, Rawad Bitar, Antonia Wachter-Zeh and Deniz Gündüz
Entropy 2025, 27(5), 541; https://doi.org/10.3390/e27050541 - 20 May 2025
Viewed by 482
Abstract
We consider the distributed stochastic gradient descent problem, where a main node distributes gradient calculations among n workers. By assigning tasks to all workers and waiting only for the k fastest ones, the main node can trade off the algorithm’s error with its [...] Read more.
We consider the distributed stochastic gradient descent problem, where a main node distributes gradient calculations among n workers. By assigning tasks to all workers and waiting only for the k fastest ones, the main node can trade off the algorithm’s error with its runtime by gradually increasing k as the algorithm evolves. However, this strategy, referred to as adaptive k-sync, neglects the cost of unused computations and of communicating models to workers that reveal a straggling behavior. We propose a cost-efficient scheme that assigns tasks only to k workers, and gradually increases k. To learn which workers are the fastest while assigning gradient calculations, we introduce the use of a combinatorial multi-armed bandit model. Assuming workers have exponentially distributed response times with different means, we provide both empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent learning the mean response times of the workers. Furthermore, we propose and analyze a strategy that is applicable to a large class of response time distributions. Compared to adaptive k-sync, our scheme achieves significantly lower errors with the same computational efforts and less downlink communication while being inferior in terms of speed. Full article
(This article belongs to the Special Issue Information-Theoretic Approaches for Machine Learning and AI)
Show Figures

Figure 1

53 pages, 1295 KiB  
Review
Selective Reviews of Bandit Problems in AI via a Statistical View
by Pengjie Zhou, Haoyu Wei and Huiming Zhang
Mathematics 2025, 13(4), 665; https://doi.org/10.3390/math13040665 - 18 Feb 2025
Cited by 3 | Viewed by 806
Abstract
Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review [...] Read more.
Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration–exploitation trade-offs. Additionally, we explore K-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field. Full article
(This article belongs to the Special Issue Advances in Statistical AI and Causal Inference)
Show Figures

Figure 1

21 pages, 2816 KiB  
Article
Reinforcement Learning-Based Resource Allocation and Energy Efficiency Optimization for a Space–Air–Ground-Integrated Network
by Zhiyu Chen, Hongxi Zhou, Siyuan Du, Jiayan Liu, Luyang Zhang and Qi Liu
Electronics 2024, 13(9), 1792; https://doi.org/10.3390/electronics13091792 - 6 May 2024
Cited by 2 | Viewed by 2127
Abstract
With the construction and development of the smart grid, the power business puts higher requirements on the communication capability of the network. In order to improve the energy efficiency of the space–air–ground-integrated power three-dimensional fusion communication network, we establish an optimization problem for [...] Read more.
With the construction and development of the smart grid, the power business puts higher requirements on the communication capability of the network. In order to improve the energy efficiency of the space–air–ground-integrated power three-dimensional fusion communication network, we establish an optimization problem for joint air platform (AP) flight path selection, ground power facility (GPF) association, and power control. In solving the problem, we decompose the problem into two subproblems, one is the AP flight path selection subproblem and the other is the GPF association and power control subproblem. Firstly, based on the GPF distribution and throughput weights, we model the AP flight path selection subproblem as a Markov Decision Process (MDP) and propose a multi-agent iterative optimization algorithm based on the comprehensive judgment of GPF positions and workload. Secondly, we model the GPF association and power control subproblem as a multi-agent, time-varying K-armed bandit model and propose an algorithm based on multi-agent Temporal Difference (TD) learning. Then, by alternately iterating between the two subproblems, we propose a reinforcement learning (RL)-based joint optimization algorithm. Finally, the simulation results indicate that compared to the three baseline algorithms (random path, average transmit power, and random device association), the proposed algorithm improves an overall energy efficiency of the system of 16.23%, 86.29%, and 5.11% under various conditions (including different noise power levels, GPF bandwidth, and GPF quantities), respectively. Full article
(This article belongs to the Special Issue 5G and 6G Wireless Systems: Challenges, Insights, and Opportunities)
Show Figures

Figure 1

23 pages, 2122 KiB  
Article
Scheduling Sparse LEO Satellite Transmissions for Remote Water Level Monitoring
by Garrett Kinman, Željko Žilić and David Purnell
Sensors 2023, 23(12), 5581; https://doi.org/10.3390/s23125581 - 14 Jun 2023
Cited by 1 | Viewed by 2182
Abstract
This paper explores the use of low earth orbit (LEO) satellite links in long-term monitoring of water levels across remote areas. Emerging sparse LEO satellite constellations maintain sporadic connection to the ground station, and transmissions need to be scheduled for satellite overfly periods. [...] Read more.
This paper explores the use of low earth orbit (LEO) satellite links in long-term monitoring of water levels across remote areas. Emerging sparse LEO satellite constellations maintain sporadic connection to the ground station, and transmissions need to be scheduled for satellite overfly periods. For remote sensing, the energy consumption optimization is critical, and we develop a learning approach for scheduling the transmission times from the sensors. Our online learning-based approach combines Monte Carlo and modified k-armed bandit approaches, to produce an inexpensive scheme that is applicable to scheduling any LEO satellite transmissions. We demonstrate its ability to adapt in three common scenarios, to save the transmission energy 20-fold, and provide the means to explore the parameters. The presented study is applicable to wide range of IoT applications in areas with no existing wireless coverages. Full article
(This article belongs to the Special Issue Energy-Efficient Communication Networks and Systems)
Show Figures

Figure 1

8 pages, 1279 KiB  
Article
Decentralized Blind Spectrum Selection in Cognitive Radio Networks Considering Handoff Cost
by Yongqun Chen, Huaibei Zhou, Ruoshan Kong, Li Zhu and Huaqing Mao
Future Internet 2017, 9(2), 10; https://doi.org/10.3390/fi9020010 - 31 Mar 2017
Cited by 4 | Viewed by 5231
Abstract
Due to the spectrum varying nature of cognitive radio networks, secondary users are required to perform spectrum handoffs when the spectrum is occupied by primary users, which will lead to a handoff delay. In this paper, based on the multi-armed bandit framework of [...] Read more.
Due to the spectrum varying nature of cognitive radio networks, secondary users are required to perform spectrum handoffs when the spectrum is occupied by primary users, which will lead to a handoff delay. In this paper, based on the multi-armed bandit framework of medium access in decentralized cognitive radio networks, we investigate blind spectrum selection problem of secondary users whose sensing ability of cognitive radio is limited and the channel statistics are a priori unknown, taking the handoff delay as a fixed handoff cost into consideration. In this scenario, secondary users have to make the choice of either staying foregoing spectrum with low availability or handing off to another spectrum with higher availability. We model the problem and investigate the performance of three representative policies, i.e., ρPRE, SL(K), kth-UCB1. The simulation results show that, despite the inclusion of the fixed handoff cost, these policies achieve the same asymptotic performance as that without handoff cost. Moreover, through comparison of these policies, we found the kth-UCB1 policy has better overall performance. Full article
(This article belongs to the Special Issue Context-Awareness of Mobile Systems)
Show Figures

Figure 1

22 pages, 891 KiB  
Article
An Artificial Bee Colony Algorithm for the Job Shop Scheduling Problem with Random Processing Times
by Rui Zhang and Cheng Wu
Entropy 2011, 13(9), 1708-1729; https://doi.org/10.3390/e13091708 - 19 Sep 2011
Cited by 43 | Viewed by 9164
Abstract
Due to the influence of unpredictable random events, the processing time of each operation should be treated as random variables if we aim at a robust production schedule. However, compared with the extensive research on the deterministic model, the stochastic job shop scheduling [...] Read more.
Due to the influence of unpredictable random events, the processing time of each operation should be treated as random variables if we aim at a robust production schedule. However, compared with the extensive research on the deterministic model, the stochastic job shop scheduling problem (SJSSP) has not received sufficient attention. In this paper, we propose an artificial bee colony (ABC) algorithm for SJSSP with the objective of minimizing the maximum lateness (which is an index of service quality). First, we propose a performance estimate for preliminary screening of the candidate solutions. Then, the K-armed bandit model is utilized for reducing the computational burden in the exact evaluation (through Monte Carlo simulation) process. Finally, the computational results on different-scale test problems validate the effectiveness and efficiency of the proposed approach. Full article
Show Figures

Figure 1

Back to TopTop