Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (14)

Search Parameters:
Keywords = greedy bandit algorithm

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 1369 KB  
Article
Evidence-Driven Simulated Data in Reinforcement Learning Training for Personalized mHealth Interventions
by Juan Carlos Caro, Giorgio Galgano, Melissa Muñoz, Jorge Díaz Ramírez and Jorge Maluenda
Appl. Sci. 2026, 16(7), 3463; https://doi.org/10.3390/app16073463 - 2 Apr 2026
Viewed by 278
Abstract
Physical inactivity is a major preventable cause of non-communicable disease and premature mortality. Mobile health interventions can promote physical activity, but their effectiveness depends on the ability to adapt to user’s context and motivation. Reinforcement learning (RL), particularly contextual bandits (CBs), offers a [...] Read more.
Physical inactivity is a major preventable cause of non-communicable disease and premature mortality. Mobile health interventions can promote physical activity, but their effectiveness depends on the ability to adapt to user’s context and motivation. Reinforcement learning (RL), particularly contextual bandits (CBs), offers a promising framework for such adaptive personalization. However, in practice, RL-based models face the cold start problem (CSP), due to the lack of initial training data. This study examines whether theory-driven simulated data can mitigate the CSP in training RL systems for personalized physical activity recommendations. A scoping review of 18 empirical studies on the Integrated Behavioral Change Model (IBC) provided population parameters for key constructs, used to simulate 2000 virtual users via multivariate modeling and structural equation calibration. A CB algorithm with an ε-greedy policy was trained with this dataset and compared with data from real world pilot using the Apptivate mHealth web-app (n = 588). Results showed close alignment between simulated and real behaviors. Our findings demonstrate that behaviorally informed synthetic data can effectively be used to train RL algorithms, offering an interpretable, sustainable, scalable, and privacy-safe solution to the CSP in personalized digital health interventions. Full article
(This article belongs to the Special Issue Health Informatics: Human Health and Health Care Services)
Show Figures

Figure 1

27 pages, 520 KB  
Article
QiMARL: Quantum-Inspired Multi-Agent Reinforcement Learning Strategy for Efficient Resource Energy Distribution in Nodal Power Stations
by Sapthak Mohajon Turjya, Anjan Bandyopadhyay, M. Shamim Kaiser and Kanad Ray
AI 2025, 6(9), 209; https://doi.org/10.3390/ai6090209 - 1 Sep 2025
Cited by 3 | Viewed by 3317
Abstract
The coupling of quantum computing with multi-agent reinforcement learning (MARL) provides an exciting direction to tackle intricate decision-making tasks in high-dimensional spaces. This work introduces a new quantum-inspired multi-agent reinforcement learning (QiMARL) model, utilizing quantum parallelism to achieve learning efficiency and scalability improvement. [...] Read more.
The coupling of quantum computing with multi-agent reinforcement learning (MARL) provides an exciting direction to tackle intricate decision-making tasks in high-dimensional spaces. This work introduces a new quantum-inspired multi-agent reinforcement learning (QiMARL) model, utilizing quantum parallelism to achieve learning efficiency and scalability improvement. The QiMARL model is tested on an energy distribution task, which optimizes power distribution between generating and demanding nodal power stations. We compare the convergence time, reward performance, and scalability of QiMARL with traditional Multi-Armed Bandit (MAB) and Multi-Agent Reinforcement Learning methods, such as Greedy, Upper Confidence Bound (UCB), Thompson Sampling, MADDPG, QMIX, and PPO methods with a comprehensive ablation study. Our findings show that QiMARL yields better performance in high-dimensional systems, decreasing the number of training epochs needed for convergence while enhancing overall reward maximization. We also compare the algorithm’s computational complexity, indicating that QiMARL is more scalable to high-dimensional quantum environments. This research opens the door to future studies of quantum-enhanced reinforcement learning (RL) with potential applications to energy optimization, traffic management, and other multi-agent coordination problems. Full article
(This article belongs to the Special Issue Advances in Quantum Computing and Quantum Machine Learning)
Show Figures

Figure 1

29 pages, 1715 KB  
Article
Multi-Armed Bandit Approaches for Location Planning with Dynamic Relief Supplies Allocation Under Disaster Uncertainty
by Jun Liang, Zongjia Zhang and Yanpeng Zhi
Smart Cities 2025, 8(1), 5; https://doi.org/10.3390/smartcities8010005 - 25 Dec 2024
Cited by 3 | Viewed by 2363
Abstract
Natural disasters (e.g., floods, earthquakes) significantly impact citizens, economies, and the environment worldwide. Due to their sudden onset, devastating effects, and high uncertainty, it is crucial for emergency departments to take swift action to minimize losses. Among these actions, planning the locations of [...] Read more.
Natural disasters (e.g., floods, earthquakes) significantly impact citizens, economies, and the environment worldwide. Due to their sudden onset, devastating effects, and high uncertainty, it is crucial for emergency departments to take swift action to minimize losses. Among these actions, planning the locations of relief supply distribution centers and dynamically allocating supplies is paramount, as governments must prioritize citizens’ safety and basic living needs following disasters. To address this challenge, this paper develops a three-layer emergency logistics network to manage the flow of emergency materials, from warehouses to transfer stations to disaster sites. A bi-objective, multi-period stochastic integer programming model is proposed to solve the emergency location, distribution, and allocation problem under uncertainty, focusing on three key decisions: transfer station selection, upstream emergency material distribution, and downstream emergency material allocation. We introduce a multi-armed bandit algorithm, named the Geometric Greedy algorithm, to optimize transfer station planning while accounting for subsequent dynamic relief supply distribution and allocation in a stochastic environment. The new algorithm is compared with two widely used multi-armed bandit algorithms: the ϵ-Greedy algorithm and the Upper Confidence Bound (UCB) algorithm. A case study in the Futian District of Shenzhen, China, demonstrates the practicality of our model and algorithms. The results show that the Geometric Greedy algorithm excels in both computational efficiency and convergence stability. This research offers valuable guidelines for emergency departments in optimizing the layout and flow of emergency logistics networks. Full article
(This article belongs to the Section Applied Science and Humanities for Smart Cities)
Show Figures

Figure 1

16 pages, 2850 KB  
Article
Multi-Armed Bandit-Based User Network Node Selection
by Qinyan Gao and Zhidong Xie
Sensors 2024, 24(13), 4104; https://doi.org/10.3390/s24134104 - 24 Jun 2024
Cited by 2 | Viewed by 2130
Abstract
In the scenario of an integrated space–air–ground emergency communication network, users encounter the challenge of rapidly identifying the optimal network node amidst the uncertainty and stochastic fluctuations of network states. This study introduces a Multi-Armed Bandit (MAB) model and proposes an optimization algorithm [...] Read more.
In the scenario of an integrated space–air–ground emergency communication network, users encounter the challenge of rapidly identifying the optimal network node amidst the uncertainty and stochastic fluctuations of network states. This study introduces a Multi-Armed Bandit (MAB) model and proposes an optimization algorithm leveraging dynamic variance sampling (DVS). The algorithm posits that the prior distribution of each node’s network state conforms to a normal distribution, and by constructing the distribution’s expected value and variance, it maximizes the utilization of sample data, thereby maintaining an equilibrium between data exploitation and the exploration of the unknown. Theoretical substantiation is provided to illustrate that the Bayesian regret associated with the algorithm exhibits sublinear growth. Empirical simulations corroborate that the algorithm in question outperforms traditional ε-greedy, Upper Confidence Bound (UCB), and Thompson sampling algorithms in terms of higher cumulative rewards, diminished total regret, accelerated convergence rates, and enhanced system throughput. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

15 pages, 4801 KB  
Article
An Intelligent Control and a Model Predictive Control for a Single Landing Gear Equipped with a Magnetorheological Damper
by Quang-Ngoc Le, Hyeong-Mo Park, Yeongjin Kim, Huy-Hoang Pham, Jai-Hyuk Hwang and Quoc-Viet Luong
Aerospace 2023, 10(11), 951; https://doi.org/10.3390/aerospace10110951 - 11 Nov 2023
Cited by 11 | Viewed by 2783
Abstract
Aircraft landing gear equipped with a magnetorheological (MR) damper is a semi-active system that contains nonlinear behavior, disturbances, uncertainties, and delay times that can have a huge impact on the landing’s performance. To solve this problem, this paper adopts two types of controllers, [...] Read more.
Aircraft landing gear equipped with a magnetorheological (MR) damper is a semi-active system that contains nonlinear behavior, disturbances, uncertainties, and delay times that can have a huge impact on the landing’s performance. To solve this problem, this paper adopts two types of controllers, which are an intelligent controller and a model predictive controller, for a landing gear equipped with an MR damper to improve the landing gear performance considering response time in different landing cases. A model predictive controller is built based on the mathematical model of the landing gear system. An intelligent controller based on a neural network is designed and trained using a greedy bandit algorithm to improve the shock absorber efficiency at different aircraft masses and sink speeds. In this MR damper, the response time is assumed to be constant at 20 ms, which is similar to the response time of the commercial MR damper. To verify the efficiency of the proposed controllers, numerical simulations compared with a passive damper and a skyhook controller in different landing cases are executed. The major finding indicates that the suggested controller performs better in various landing scenarios than other controllers in terms of shock absorber effectiveness and adaptability. Full article
Show Figures

Figure 1

25 pages, 1938 KB  
Article
Combinatorial MAB-Based Joint Channel and Spreading Factor Selection for LoRa Devices
by Ikumi Urabe, Aohan Li, Minoru Fujisawa, Song-Ju Kim and Mikio Hasegawa
Sensors 2023, 23(15), 6687; https://doi.org/10.3390/s23156687 - 26 Jul 2023
Cited by 10 | Viewed by 3545
Abstract
Long-Range (LoRa) devices have been deployed in many Internet of Things (IoT) applications due to their ability to communicate over long distances with low power consumption. The scalability and communication performance of the LoRa systems are highly dependent on the spreading factor (SF) [...] Read more.
Long-Range (LoRa) devices have been deployed in many Internet of Things (IoT) applications due to their ability to communicate over long distances with low power consumption. The scalability and communication performance of the LoRa systems are highly dependent on the spreading factor (SF) and channel allocations. In particular, it is important to set the SF appropriately according to the distance between the LoRa device and the gateway since the signal reception sensitivity and bit rate depend on the used SF, which are in a trade-off relationship. In addition, considering the surge in the number of LoRa devices recently, the scalability of LoRa systems is also greatly affected by the channels that the LoRa devices use for communications. It was demonstrated that the lightweight decentralized learning-based joint channel and SF-selection methods can make appropriate decisions with low computational complexity and power consumption in our previous study. However, the effect of the location situation of the LoRa devices on the communication performance in a practical larger-scale LoRa system has not been studied. Hence, to clarify the effect of the location situation of the LoRa devices on the communication performance in LoRa systems, in this paper, we implemented and evaluated the learning-based joint channel and SF-selection methods in a practical LoRa system. In the learning-based methods, the channel and SF are decided only based on the ACKnowledge information. The learning methods evaluated in this paper were the Tug of War dynamics, Upper Confidence Bound 1, and ϵ-greedy algorithms. Moreover, to consider the relevance of the channel and SF, we propose a combinational multi-armed bandit-based joint channel and SF-selection method. Compared with the independent methods, the combinations of the channel and SF are set as arms. Conversely, the SF and channel are set as independent arms in the independent methods that are evaluated in our previous work. From the experimental results, we can see the following points. First, the combinatorial methods can achieve a higher frame success rate and fairness than the independent methods. In addition, the FSR can be improved by joint channel and SF selection compared to SF selection only. Moreover, the channel and SF selection dependents on the location situation to a great extent. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

14 pages, 384 KB  
Article
An ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes
by Isa Muqattash and Jiaqiao Hu
Stats 2023, 6(1), 99-112; https://doi.org/10.3390/stats6010006 - 1 Jan 2023
Cited by 2 | Viewed by 2439
Abstract
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in [...] Read more.
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

18 pages, 514 KB  
Article
Multi-Node Joint Power Allocation Algorithm Based on Hierarchical Game Learning in Underwater Acoustic Sensor Networks
by Hui Wang, Yao Huang, Fang Luo and Liejun Yang
Remote Sens. 2022, 14(24), 6215; https://doi.org/10.3390/rs14246215 - 8 Dec 2022
Cited by 9 | Viewed by 2799
Abstract
In order to improve the overall service quality of the network and reduce the level of network interference, power allocation has become one of the research focuses in the field of underwater acoustic communication in recent years. Aiming at the issue of power [...] Read more.
In order to improve the overall service quality of the network and reduce the level of network interference, power allocation has become one of the research focuses in the field of underwater acoustic communication in recent years. Aiming at the issue of power allocation when channel information is difficult to obtain in complex underwater acoustic communication networks, a completely distributed game learning algorithm is proposed that does not require any prior channel information and direct information exchange between nodes. Specifically, the power allocation problem is constructed as a multi-node multi-armed bandit (MAB) game model. Then, considering nodes as agents and multi-node networks as multi-agent networks, a power allocation algorithm based on a softmax-greedy action selection strategy is proposed. In order to improve the learning efficiency of the agent, reduce the learning cost, and mine the historical reward information, a learning algorithm based on the two-layer hierarchical game learning (HGL) strategy is further proposed. Finally, the simulation results show that the algorithm not only shows good convergence speed and stability but also can adapt to a harsh and complex network environment and has a certain tolerance for incomplete channel information acquisition. Full article
(This article belongs to the Special Issue Underwater Communication and Networking)
Show Figures

Figure 1

31 pages, 451 KB  
Article
Multi-Gear Bandits, Partial Conservation Laws, and Indexability
by José Niño-Mora
Mathematics 2022, 10(14), 2497; https://doi.org/10.3390/math10142497 - 18 Jul 2022
Cited by 4 | Viewed by 2586
Abstract
This paper considers what we propose to call multi-gear bandits, which are Markov decision processes modeling a generic dynamic and stochastic project fueled by a single resource and which admit multiple actions representing gears of operation naturally ordered by their increasing resource [...] Read more.
This paper considers what we propose to call multi-gear bandits, which are Markov decision processes modeling a generic dynamic and stochastic project fueled by a single resource and which admit multiple actions representing gears of operation naturally ordered by their increasing resource consumption. The optimal operation of a multi-gear bandit aims to strike a balance between project performance costs or rewards and resource usage costs, which depend on the resource price. A computationally convenient and intuitive optimal solution is available when such a model is indexable, meaning that its optimal policies are characterized by a dynamic allocation index (DAI), a function of state–action pairs representing critical resource prices. Motivated by the lack of general indexability conditions and efficient index-computing schemes, and focusing on the infinite-horizon finite-state and -action discounted case, we present a verification theorem ensuring that, if a model satisfies two proposed PCL-indexability conditions with respect to a postulated family of structured policies, then it is indexable and such policies are optimal, with its DAI being given by a marginal productivity index computed by a downshift adaptive-greedy algorithm in AN steps, with A+1 actions and N states. The DAI is further used as the basis of a new index policy for the multi-armed multi-gear bandit problem. Full article
(This article belongs to the Section D1: Probability and Statistics)
16 pages, 1785 KB  
Article
BER Minimization by User Pairing in Downlink NOMA Using Laser Chaos Decision-Maker
by Masaki Sugiyama, Aohan Li, Zengchao Duan, Makoto Naruse and Mikio Hasegawa
Electronics 2022, 11(9), 1452; https://doi.org/10.3390/electronics11091452 - 30 Apr 2022
Cited by 3 | Viewed by 3073
Abstract
In next-generation wireless communication systems, non-orthogonal multiple access (NOMA) has been recognized as essential technology for improving the spectrum efficiency. NOMA allows multiple users transmit data using the same resource block simultaneously with proper user pairing. Most of the pairing schemes, however, require [...] Read more.
In next-generation wireless communication systems, non-orthogonal multiple access (NOMA) has been recognized as essential technology for improving the spectrum efficiency. NOMA allows multiple users transmit data using the same resource block simultaneously with proper user pairing. Most of the pairing schemes, however, require prior information, such as location information of the users, leading to difficulties in realizing prompt user pairing. To realize real-time operations without prior information in NOMA, a bandit algorithm using chaotically oscillating time series, which we refer to as the laser chaos decision-maker, was demonstrated. However, this scheme did not consider the detailed communication processes, e.g., modulation, error correction code, etc. In this study, in order to adapt the laser chaos decision-maker to real communication systems, we propose a user pairing scheme based on acknowledgment (ACK) and negative acknowledgment (NACK) information considering detailed communication channels. Furthermore, based on the insights gained by the analysis of parameter dependencies, we introduce an adaptive pairing method to minimize the bit error rate of the NOMA system under study. The numerical results show that the proposed method achieves superior performances than the traditional using pairing schemes, i.e., Conventional-NOMA pairing scheme (C-NOMA) and Unified Channel Gain Difference pairing scheme (UCGD-NOMA), and ϵ-greedy-based user pairing scheme. As the cell radius of the NOMA system gets smaller, the superior on the BER of our proposed scheme gets bigger. Specifically, our proposed scheme can decrease the BER from 101 to 105 compared to the conventional schemes when the cell radius is 400 m. Full article
(This article belongs to the Special Issue Advances in Intelligence Networking and Computing)
Show Figures

Figure 1

19 pages, 731 KB  
Article
Enhanced Dynamic Spectrum Access in UAV Wireless Networks for Post-Disaster Area Surveillance System: A Multi-Player Multi-Armed Bandit Approach
by Amr Amrallah, Ehab Mahmoud Mohamed, Gia Khanh Tran and Kei Sakaguchi
Sensors 2021, 21(23), 7855; https://doi.org/10.3390/s21237855 - 25 Nov 2021
Cited by 19 | Viewed by 4207
Abstract
Modern wireless networks are notorious for being very dense, uncoordinated, and selfish, especially with greedy user needs. This leads to a critical scarcity problem in spectrum resources. The Dynamic Spectrum Access system (DSA) is considered a promising solution for this scarcity problem. With [...] Read more.
Modern wireless networks are notorious for being very dense, uncoordinated, and selfish, especially with greedy user needs. This leads to a critical scarcity problem in spectrum resources. The Dynamic Spectrum Access system (DSA) is considered a promising solution for this scarcity problem. With the aid of Unmanned Aerial Vehicles (UAVs), a post-disaster surveillance system is implemented using Cognitive Radio Network (CRN). UAVs are distributed in the disaster area to capture live images of the damaged area and send them to the disaster management center. CRN enables UAVs to utilize a portion of the spectrum of the Electronic Toll Collection (ETC) gates operating in the same area. In this paper, a joint transmission power selection, data-rate maximization, and interference mitigation problem is addressed. Considering all these conflicting parameters, this problem is investigated as a budget-constrained multi-player multi-armed bandit (MAB) problem. The whole process is done in a decentralized manner, where no information is exchanged between UAVs. To achieve this, two power-budget-aware PBA-MAB) algorithms, namely upper confidence bound (PBA-UCB (MAB) algorithm and Thompson sampling (PBA-TS) algorithm, were proposed to realize the selection of the transmission power value efficiently. The proposed PBA-MAB algorithms show outstanding performance over random power value selection in terms of achievable data rate. Full article
Show Figures

Figure 1

27 pages, 1813 KB  
Article
Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
by Le Li and Benjamin Guedj
Entropy 2021, 23(11), 1534; https://doi.org/10.3390/e23111534 - 18 Nov 2021
Cited by 3 | Viewed by 3297
Abstract
When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn [...] Read more.
When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called slpc, for sequential learning principal curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

21 pages, 673 KB  
Article
A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index
by José Niño-Mora
Mathematics 2020, 8(12), 2226; https://doi.org/10.3390/math8122226 - 15 Dec 2020
Cited by 6 | Viewed by 4492
Abstract
The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further [...] Read more.
The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms. Full article
(This article belongs to the Special Issue Applied Probability)
Show Figures

Figure 1

13 pages, 2990 KB  
Article
Recommendation of Workplaces in a Coworking Building: A Cyber-Physical Approach Supported by a Context-Aware Multi-Agent System
by Luis Gomes, Carlos Almeida and Zita Vale
Sensors 2020, 20(12), 3597; https://doi.org/10.3390/s20123597 - 25 Jun 2020
Cited by 13 | Viewed by 3761
Abstract
Recommender systems are able to suggest the most suitable items to a given user, taking into account the user’s and item`s data. Currently, these systems are offered almost everywhere in the online world, such as in e-commerce websites, newsletters, or video platforms. To [...] Read more.
Recommender systems are able to suggest the most suitable items to a given user, taking into account the user’s and item`s data. Currently, these systems are offered almost everywhere in the online world, such as in e-commerce websites, newsletters, or video platforms. To improve recommendations, the user’s context should be considered to provide more accurate algorithms able to achieve higher payoffs. In this paper, we propose a pre-filtering recommendation system that considers the context of a coworking building and suggests the best workplaces to a user. A cyber-physical context-aware multi-agent system is used to monitor the building and feed the pre-filtering process using fuzzy logic. Recommendations are made by a multi-armed bandit algorithm, using ϵ -greedy and upper confidence bound methods. The paper presents the main results of simulations for one, two, three, and five years to illustrate the use of the proposed system. Full article
(This article belongs to the Special Issue Sensor-Based, Context-Aware Recommender Systems)
Show Figures

Figure 1

Back to TopTop