MDPI - Publisher of Open Access Journals

24 pages, 2171 KiB

Open AccessFeature PaperArticle

Cost-Efficient Distributed Learning via Combinatorial Multi-Armed Bandits

by Maximilian Egger, Rawad Bitar, Antonia Wachter-Zeh and Deniz Gündüz

Entropy 2025, 27(5), 541; https://doi.org/10.3390/e27050541 - 20 May 2025

Viewed by 482

We consider the distributed stochastic gradient descent problem, where a main node distributes gradient calculations among n workers. By assigning tasks to all workers and waiting only for the k fastest ones, the main node can trade off the algorithm’s error with its runtime by gradually increasing k as the algorithm evolves. However, this strategy, referred to as adaptive k-sync, neglects the cost of unused computations and of communicating models to workers that reveal a straggling behavior. We propose a cost-efficient scheme that assigns tasks only to k workers, and gradually increases k. To learn which workers are the fastest while assigning gradient calculations, we introduce the use of a combinatorial multi-armed bandit model. Assuming workers have exponentially distributed response times with different means, we provide both empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent learning the mean response times of the workers. Furthermore, we propose and analyze a strategy that is applicable to a large class of response time distributions. Compared to adaptive k-sync, our scheme achieves significantly lower errors with the same computational efforts and less downlink communication while being inferior in terms of speed. Full article

(This article belongs to the Special Issue Information-Theoretic Approaches for Machine Learning and AI)

► Show Figures

Figure 1

53 pages, 1295 KiB

Open AccessReview

Selective Reviews of Bandit Problems in AI via a Statistical View

by Pengjie Zhou, Haoyu Wei and Huiming Zhang

Mathematics 2025, 13(4), 665; https://doi.org/10.3390/math13040665 - 18 Feb 2025

Cited by 3 | Viewed by 806

Abstract

Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration–exploitation trade-offs. Additionally, we explore K-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field. Full article

(This article belongs to the Special Issue Advances in Statistical AI and Causal Inference)

► Show Figures

Figure 1

21 pages, 2816 KiB

Open AccessArticle

Reinforcement Learning-Based Resource Allocation and Energy Efficiency Optimization for a Space–Air–Ground-Integrated Network

by Zhiyu Chen, Hongxi Zhou, Siyuan Du, Jiayan Liu, Luyang Zhang and Qi Liu

Electronics 2024, 13(9), 1792; https://doi.org/10.3390/electronics13091792 - 6 May 2024

Cited by 2 | Viewed by 2127

Abstract

With the construction and development of the smart grid, the power business puts higher requirements on the communication capability of the network. In order to improve the energy efficiency of the space–air–ground-integrated power three-dimensional fusion communication network, we establish an optimization problem for joint air platform (AP) flight path selection, ground power facility (GPF) association, and power control. In solving the problem, we decompose the problem into two subproblems, one is the AP flight path selection subproblem and the other is the GPF association and power control subproblem. Firstly, based on the GPF distribution and throughput weights, we model the AP flight path selection subproblem as a Markov Decision Process (MDP) and propose a multi-agent iterative optimization algorithm based on the comprehensive judgment of GPF positions and workload. Secondly, we model the GPF association and power control subproblem as a multi-agent, time-varying K-armed bandit model and propose an algorithm based on multi-agent Temporal Difference (TD) learning. Then, by alternately iterating between the two subproblems, we propose a reinforcement learning (RL)-based joint optimization algorithm. Finally, the simulation results indicate that compared to the three baseline algorithms (random path, average transmit power, and random device association), the proposed algorithm improves an overall energy efficiency of the system of 16.23%, 86.29%, and 5.11% under various conditions (including different noise power levels, GPF bandwidth, and GPF quantities), respectively. Full article

(This article belongs to the Special Issue 5G and 6G Wireless Systems: Challenges, Insights, and Opportunities)

► Show Figures

Figure 1

23 pages, 2122 KiB

Open AccessArticle

Scheduling Sparse LEO Satellite Transmissions for Remote Water Level Monitoring

by Garrett Kinman, Željko Žilić and David Purnell

Sensors 2023, 23(12), 5581; https://doi.org/10.3390/s23125581 - 14 Jun 2023

Cited by 1 | Viewed by 2182

Abstract

This paper explores the use of low earth orbit (LEO) satellite links in long-term monitoring of water levels across remote areas. Emerging sparse LEO satellite constellations maintain sporadic connection to the ground station, and transmissions need to be scheduled for satellite overfly periods. For remote sensing, the energy consumption optimization is critical, and we develop a learning approach for scheduling the transmission times from the sensors. Our online learning-based approach combines Monte Carlo and modified k-armed bandit approaches, to produce an inexpensive scheme that is applicable to scheduling any LEO satellite transmissions. We demonstrate its ability to adapt in three common scenarios, to save the transmission energy 20-fold, and provide the means to explore the parameters. The presented study is applicable to wide range of IoT applications in areas with no existing wireless coverages. Full article

(This article belongs to the Special Issue Energy-Efficient Communication Networks and Systems)

► Show Figures

Figure 1

8 pages, 1279 KiB

Open AccessArticle

Decentralized Blind Spectrum Selection in Cognitive Radio Networks Considering Handoff Cost

by Yongqun Chen, Huaibei Zhou, Ruoshan Kong, Li Zhu and Huaqing Mao

Future Internet 2017, 9(2), 10; https://doi.org/10.3390/fi9020010 - 31 Mar 2017

Cited by 4 | Viewed by 5231

Abstract

Due to the spectrum varying nature of cognitive radio networks, secondary users are required to perform spectrum handoffs when the spectrum is occupied by primary users, which will lead to a handoff delay. In this paper, based on the multi-armed bandit framework of medium access in decentralized cognitive radio networks, we investigate blind spectrum selection problem of secondary users whose sensing ability of cognitive radio is limited and the channel statistics are a priori unknown, taking the handoff delay as a fixed handoff cost into consideration. In this scenario, secondary users have to make the choice of either staying foregoing spectrum with low availability or handing off to another spectrum with higher availability. We model the problem and investigate the performance of three representative policies, i.e., ρ^PRE, SL(K), kth-UCB1. The simulation results show that, despite the inclusion of the fixed handoff cost, these policies achieve the same asymptotic performance as that without handoff cost. Moreover, through comparison of these policies, we found the kth-UCB1 policy has better overall performance. Full article

(This article belongs to the Special Issue Context-Awareness of Mobile Systems)

► Show Figures

Figure 1

22 pages, 891 KiB

Open AccessArticle

An Artificial Bee Colony Algorithm for the Job Shop Scheduling Problem with Random Processing Times

by Rui Zhang and Cheng Wu

Entropy 2011, 13(9), 1708-1729; https://doi.org/10.3390/e13091708 - 19 Sep 2011

Cited by 43 | Viewed by 9164

Abstract

Due to the influence of unpredictable random events, the processing time of each operation should be treated as random variables if we aim at a robust production schedule. However, compared with the extensive research on the deterministic model, the stochastic job shop scheduling problem (SJSSP) has not received sufficient attention. In this paper, we propose an artificial bee colony (ABC) algorithm for SJSSP with the objective of minimizing the maximum lateness (which is an index of service quality). First, we propose a performance estimate for preliminary screening of the candidate solutions. Then, the K-armed bandit model is utilized for reducing the computational burden in the exact evaluation (through Monte Carlo simulation) process. Finally, the computational results on different-scale test problems validate the effectiveness and efficiency of the proposed approach. Full article

► Show Figures

Figure 1

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI