Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = non-stationary multi-armed bandit

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 974 KB  
Article
Thompson Sampling for Non-Stationary Bandit Problems
by Han Qi, Fei Guo and Li Zhu
Entropy 2025, 27(1), 51; https://doi.org/10.3390/e27010051 - 9 Jan 2025
Cited by 4 | Viewed by 4294
Abstract
Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there [...] Read more.
Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there is currently no regret bound analysis for TS with uninformative priors. To address this, we propose two algorithms, discounted TS and sliding-window TS, designed for sub-Gaussian reward distributions. For these algorithms, we establish an upper bound for the expected regret by bounding the expected number of times a suboptimal arm is played. We show that the regret upper bounds of both algorithms are O~(TBT), where T is the time horizon and BT is the number of breakpoints. This upper bound matches the lower bound for abruptly changing problems up to a logarithmic factor. Empirical comparisons with other non-stationary bandit algorithms highlight the competitive performance of our proposed methods. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

18 pages, 1206 KB  
Article
Use of Logarithmic Rates in Multi-Armed Bandit-Based Transmission Rate Control Embracing Frame Aggregations in Wireless Networks
by Soohyun Cho
Appl. Sci. 2023, 13(14), 8485; https://doi.org/10.3390/app13148485 - 22 Jul 2023
Cited by 2 | Viewed by 1845
Abstract
Herein, we propose the use of the logarithmic values of data transmission rates for multi-armed bandit (MAB) algorithms that adjust the modulation and coding scheme (MCS) levels of data packets in carrier-sensing multiple access/collision avoidance (CSMA/CA) wireless networks. We argue that the utilities [...] Read more.
Herein, we propose the use of the logarithmic values of data transmission rates for multi-armed bandit (MAB) algorithms that adjust the modulation and coding scheme (MCS) levels of data packets in carrier-sensing multiple access/collision avoidance (CSMA/CA) wireless networks. We argue that the utilities of the data transmission rates of the MCS levels may not be proportional to their nominal values and suggest using their logarithmic values instead of directly using their data transmission rates when MAB algorithms compute the expected throughputs of the MCS levels. To demonstrate the effectiveness of the proposal, we introduce two MAB algorithms that adopt the logarithmic rates of the transmission rates. The proposed MAB algorithms also support frame aggregations available in wireless network standards that aim for a high throughput. In addition, the proposed MAB algorithms use a sliding window over time to adapt to rapidly changing wireless channel environments. To evaluate the performance of the proposed MAB algorithms, we used the event-driven network simulator, ns-3. We evaluated their performance using various scenarios of stationary and non-stationary wireless network environments including multiple spatial streams and frame aggregations. The experiment results show that the proposed MAB algorithms outperform the MAB algorithms that do not adopt the logarithmic transmission rates in both the stationary and non-stationary scenarios. Full article
Show Figures

Figure 1

22 pages, 671 KB  
Article
LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
by J. de Curtò, I. de Zarzà, Gemma Roig, Juan Carlos Cano, Pietro Manzoni and Carlos T. Calafate
Electronics 2023, 12(13), 2814; https://doi.org/10.3390/electronics12132814 - 25 Jun 2023
Cited by 26 | Viewed by 11981
Abstract
In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle [...] Read more.
In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle in the face of dynamic changes, we propose a strategy informed by LLMs that offers dynamic guidance on exploration versus exploitation, contingent on the current state of the bandits. We bring forward a new non-stationary bandit model with fluctuating reward distributions and illustrate how LLMs can be employed to guide the choice of bandit amid this variability. Experimental outcomes illustrate the potential of our LLM-informed strategy, demonstrating its adaptability to the fluctuating nature of the bandit problem, while maintaining competitive performance against conventional strategies. This study provides key insights into the capabilities of LLMs in enhancing decision-making processes in dynamic and uncertain scenarios. Full article
Show Figures

Figure 1

16 pages, 7595 KB  
Article
Using Deep Reinforcement Learning with Hierarchical Risk Parity for Portfolio Optimization
by Adrian Millea and Abbas Edalat
Int. J. Financial Stud. 2023, 11(1), 10; https://doi.org/10.3390/ijfs11010010 - 29 Dec 2022
Cited by 9 | Viewed by 10659
Abstract
We devise a hierarchical decision-making architecture for portfolio optimization on multiple markets. At the highest level a Deep Reinforcement Learning (DRL) agent selects among a number of discrete actions, representing low-level agents. For the low-level agents, we use a set of Hierarchical Risk [...] Read more.
We devise a hierarchical decision-making architecture for portfolio optimization on multiple markets. At the highest level a Deep Reinforcement Learning (DRL) agent selects among a number of discrete actions, representing low-level agents. For the low-level agents, we use a set of Hierarchical Risk Parity (HRP) and Hierarchical Equal Risk Contribution (HERC) models with different hyperparameters, which all run in parallel, off-market (in a simulation). The information on which the DRL agent decides which of the low-level agents should act next is constituted by the stacking of the recent performances of all agents. Thus, the modelling resembles a statefull, non-stationary, multi-arm bandit, where the performance of the individual arms changes with time and is assumed to be dependent on the recent history. We perform experiments on the cryptocurrency market (117 assets), on the stock market (46 assets) and on the foreign exchange market (28 pairs) showing the excellent robustness and performance of the overall system. Moreover, we eliminate the need for retraining and are able to deal with large testing sets successfully. Full article
Show Figures

Figure 1

25 pages, 29480 KB  
Article
A Fine-Grain Batching-Based Task Allocation Algorithm for Spatial Crowdsourcing
by Yuxin Jiao, Zhikun Lin, Long Yu and Xiaozhu Wu
ISPRS Int. J. Geo-Inf. 2022, 11(3), 203; https://doi.org/10.3390/ijgi11030203 - 17 Mar 2022
Cited by 8 | Viewed by 3952
Abstract
Task allocation is a critical issue of spatial crowdsourcing. Although the batching strategy performs better than the real-time matching mode, it still has the following two drawbacks: (1) Because the granularity of the batch size set obtained by batching is too coarse, it [...] Read more.
Task allocation is a critical issue of spatial crowdsourcing. Although the batching strategy performs better than the real-time matching mode, it still has the following two drawbacks: (1) Because the granularity of the batch size set obtained by batching is too coarse, it will result in poor matching accuracy. However, roughly designing the batch size for all possible delays will result in a large computational overhead. (2) Ignoring non-stationary factors will lead to a change in optimal batch size that cannot be found as soon as possible. Therefore, this paper proposes a fine-grained, batching-based task allocation algorithm (FGBTA), considering non-stationary setting. In the batch method, the algorithm first uses variable step size to allow for fine-grained exploration within the predicted value given by the multi-armed bandit (MAB) algorithm and uses the results of pseudo-matching to calculate the batch utility. Then, the batch size with higher utility is selected, and the exact maximum weight matching algorithm is used to obtain the allocation result within the batch. In order to cope with the non-stationary changes, we use the sliding window (SW) method to retain the latest batch utility and discard the historical information that is too far away, so as to finally achieve refined batching and adapt to temporal changes. In addition, we also take into account the benefits of requesters, workers, and the platform. Experiments on real data and synthetic data show that this method can accomplish the task assignment of spatial crowdsourcing effectively and can adapt to the non-stationary setting as soon as possible. This paper mainly focuses on the spatial crowdsourcing task of ride-hailing. Full article
Show Figures

Figure 1

26 pages, 3947 KB  
Article
Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm
by Emanuele Cavenaghi, Gabriele Sottocornola, Fabio Stella and Markus Zanker
Entropy 2021, 23(3), 380; https://doi.org/10.3390/e23030380 - 23 Mar 2021
Cited by 25 | Viewed by 9202
Abstract
The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by [...] Read more.
The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks. Full article
Show Figures

Figure 1

17 pages, 4236 KB  
Article
A Policy for Optimizing Sub-Band Selection Sequences in Wideband Spectrum Sensing
by Yangyi Chen, Shaojing Su and Junyu Wei
Sensors 2019, 19(19), 4090; https://doi.org/10.3390/s19194090 - 21 Sep 2019
Cited by 8 | Viewed by 2529
Abstract
With the development of wireless communication technology, cognitive radio needs to solve the spectrum sensing problem of wideband wireless signals. Due to performance limitation of electronic components, it is difficult to complete spectrum sensing of wideband wireless signals at once. Therefore, it is [...] Read more.
With the development of wireless communication technology, cognitive radio needs to solve the spectrum sensing problem of wideband wireless signals. Due to performance limitation of electronic components, it is difficult to complete spectrum sensing of wideband wireless signals at once. Therefore, it is required that the wideband wireless signal has to be split into a set of sub-bands before the further signal processing. However, the sequence of sub-band perception has become one of the important factors, which deeply-impact wideband spectrum sensing performance. In this paper, we develop a novel approach for sub-band selection through the non-stationary multi-arm bandit (NS-MAB) model. This approach is based on a well-known order optimal policy for NS-MAB mode called discounted upper confidence bound (D-UCB) policy. In this paper, according to different application requirements, various discount functions and exploration bonuses of D-UCB are designed, which are taken as the parameters of the policy proposed in this paper. Our simulation result demonstrates that the proposed policy can provide lower cumulative regret than other existing state-of-the-art policies for sub-band selection of wideband spectrum sensing. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

Back to TopTop