Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (20)

Search Parameters:
Keywords = SARSA RL

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 6196 KiB  
Article
A Semi-Distributed Scheme for Mode Selection and Resource Allocation in Device-to-Device-Enabled Cellular Networks Using Matching Game and Reinforcement Learning
by Ibrahim Sami Attar, Nor Muzlifah Mahyuddin and M. H. D. Nour Hindia
Telecom 2025, 6(1), 12; https://doi.org/10.3390/telecom6010012 - 13 Feb 2025
Cited by 1 | Viewed by 800
Abstract
Device-to-Device (D2D) communication is a promising technological innovation that is significantly considered to have a substantial impact on the next generation of wireless communication systems. Modern wireless networks of the fifth generation (5G) and beyond (B5G) handle an increasing number of connected devices [...] Read more.
Device-to-Device (D2D) communication is a promising technological innovation that is significantly considered to have a substantial impact on the next generation of wireless communication systems. Modern wireless networks of the fifth generation (5G) and beyond (B5G) handle an increasing number of connected devices that require greater data rates while utilizing relatively low power consumption. In this study, we present joint mode selection, channel assignment, and power allocation issues in a semi-distributed D2D scheme (SD-scheme) that underlays cellular networks. The objective of this study is to enhance the data rate, Spectrum Efficiency (SE), and Energy Efficiency (EE) of the network while maintaining the performance of cellular users (CUs) by creating a threshold of data rate for each CU in the network. Practically, we propose a centralized approach to address the mode selection and channel assignment problems, employing greedy and matching algorithms, respectively. Moreover, we employed a State-Action-Reward-State-Action (SARSA)-based reinforcement learning (RL) algorithm for a distributed power allocation scheme. Furthermore, we suggest that the sub-channel of the CU is shared among several D2D pairs, and the optimum power is determined for each D2D pair sharing the same sub-channel, taking into consideration all types of interferences in the network. The simulation findings illustrate the enhancement in the performance of the proposed scheme in comparison to the benchmark schemes in terms of data rate, SE, and EE. Full article
(This article belongs to the Special Issue Advances in Wireless Communication: Applications and Developments)
Show Figures

Figure 1

19 pages, 6382 KiB  
Article
Tool Condition Monitoring in the Milling Process Using Deep Learning and Reinforcement Learning
by Devarajan Kaliyannan, Mohanraj Thangamuthu, Pavan Pradeep, Sakthivel Gnansekaran, Jegadeeshwaran Rakkiyannan and Alokesh Pramanik
J. Sens. Actuator Netw. 2024, 13(4), 42; https://doi.org/10.3390/jsan13040042 - 30 Jul 2024
Cited by 12 | Viewed by 3357
Abstract
Tool condition monitoring (TCM) is crucial in the machining process to confirm product quality as well as process efficiency and minimize downtime. Traditional methods for TCM, while effective to a degree, often fall short in real-time adaptability and predictive accuracy. This research work [...] Read more.
Tool condition monitoring (TCM) is crucial in the machining process to confirm product quality as well as process efficiency and minimize downtime. Traditional methods for TCM, while effective to a degree, often fall short in real-time adaptability and predictive accuracy. This research work aims to advance the state-of-the-art methods in predictive maintenance for TCM and improve tool performance and reliability during the milling process. The present work investigates the application of Deep Learning (DL) and Reinforcement Learning (RL) techniques to monitor tool conditions in milling operations. DL models, including Long Short-Term Memory (LSTM) networks, Feed Forward Neural Networks (FFNN), and RL models, including Q-learning and SARSA, are employed to classify tool conditions from the vibration sensor. The performance of the selected DL and RL algorithms is evaluated through performance metrics like confusion matrix, recall, precision, F1 score, and Receiver Operating Characteristics (ROC) curves. The results revealed that RL based on SARSA outperformed other algorithms. The overall classification accuracies for LSTM, FFNN, Q-learning, and SARSA were 94.85%, 98.16%, 98.50%, and 98.66%, respectively. In regard to predicting tool conditions accurately and thereby enhancing overall process efficiency, SARSA showed the best performance, followed by Q-learning, FFNN, and LSTM. This work contributes to the advancement of TCM systems, highlighting the potential of DL and RL techniques to revolutionize manufacturing processes in the era of Industry 5.0. Full article
(This article belongs to the Special Issue Fault Diagnosis in the Internet of Things Applications)
Show Figures

Figure 1

21 pages, 3651 KiB  
Article
A Reinforcement Learning-Based Multi-Objective Bat Algorithm Applied to Edge Computing Task-Offloading Decision Making
by Chwan-Lu Tseng, Che-Shen Cheng and Yu-Hsuan Shen
Appl. Sci. 2024, 14(12), 5088; https://doi.org/10.3390/app14125088 - 11 Jun 2024
Cited by 1 | Viewed by 1343
Abstract
Amid the escalating complexity of networks, wireless intelligent devices, constrained by energy and resources, bear the increasing burden of managing various tasks. The decision of whether to allocate tasks to edge servers or handle them locally on devices now significantly impacts network performance. [...] Read more.
Amid the escalating complexity of networks, wireless intelligent devices, constrained by energy and resources, bear the increasing burden of managing various tasks. The decision of whether to allocate tasks to edge servers or handle them locally on devices now significantly impacts network performance. This study focuses on optimizing task-offloading decisions to balance network latency and energy consumption. An advanced learning-based multi-objective bat algorithm, MOBA-CV-SARSA, tailored to the constraints of wireless devices, presents a promising solution for edge computing task offloading. Developed in C++, MOBA-CV-SARSA demonstrates significant improvements over NSGA-RL-CV and QLPSO-CV, enhancing hypervolume and diversity-metric indicators by 0.9%, 15.07%, 4.72%, and 0.1%, respectively. Remarkably, MOBA-CV-SARSA effectively reduces network energy consumption within acceptable latency thresholds. Moreover, integrating an automatic switching mechanism enables MOBA-CV-SARSA to accelerate convergence speed while conserving 150.825 W of energy, resulting in a substantial 20.24% reduction in overall network energy consumption. Full article
(This article belongs to the Topic Cloud and Edge Computing for Smart Devices)
Show Figures

Figure 1

27 pages, 6109 KiB  
Article
An Improved Dyna-Q Algorithm Inspired by the Forward Prediction Mechanism in the Rat Brain for Mobile Robot Path Planning
by Jing Huang, Ziheng Zhang and Xiaogang Ruan
Biomimetics 2024, 9(6), 315; https://doi.org/10.3390/biomimetics9060315 - 23 May 2024
Cited by 1 | Viewed by 2052
Abstract
The traditional Model-Based Reinforcement Learning (MBRL) algorithm has high computational cost, poor convergence, and poor performance in robot spatial cognition and navigation tasks, and it cannot fully explain the ability of animals to quickly adapt to environmental changes and learn a variety of [...] Read more.
The traditional Model-Based Reinforcement Learning (MBRL) algorithm has high computational cost, poor convergence, and poor performance in robot spatial cognition and navigation tasks, and it cannot fully explain the ability of animals to quickly adapt to environmental changes and learn a variety of complex tasks. Studies have shown that vicarious trial and error (VTE) and the hippocampus forward prediction mechanism in rats and other mammals can be used as key components of action selection in MBRL to support “goal-oriented” behavior. Therefore, we propose an improved Dyna-Q algorithm inspired by the forward prediction mechanism of the hippocampus to solve the above problems and tackle the exploration–exploitation dilemma of Reinforcement Learning (RL). This algorithm alternately presents the potential path in the future for mobile robots and dynamically adjusts the sweep length according to the decision certainty, so as to determine action selection. We test the performance of the algorithm in a two-dimensional maze environment with static and dynamic obstacles, respectively. Compared with classic RL algorithms like State-Action-Reward-State-Action (SARSA) and Dyna-Q, the algorithm can speed up spatial cognition and improve the global search ability of path planning. In addition, our method reflects key features of how the brain organizes MBRL to effectively solve difficult tasks such as navigation, and it provides a new idea for spatial cognitive tasks from a biological perspective. Full article
(This article belongs to the Special Issue Bioinspired Algorithms)
Show Figures

Figure 1

16 pages, 2753 KiB  
Article
Intelligent Scheduling Based on Reinforcement Learning Approaches: Applying Advanced Q-Learning and State–Action–Reward–State–Action Reinforcement Learning Models for the Optimisation of Job Shop Scheduling Problems
by Atefeh Momenikorbekandi and Maysam Abbod
Electronics 2023, 12(23), 4752; https://doi.org/10.3390/electronics12234752 - 23 Nov 2023
Cited by 14 | Viewed by 3993
Abstract
Flexible job shop scheduling problems (FJSPs) have attracted significant research interest because they can considerably increase production efficiency in terms of energy, cost and time; they are considered the main part of the manufacturing systems which frequently need to be resolved to manage [...] Read more.
Flexible job shop scheduling problems (FJSPs) have attracted significant research interest because they can considerably increase production efficiency in terms of energy, cost and time; they are considered the main part of the manufacturing systems which frequently need to be resolved to manage the variations in production requirements. In this study, novel reinforcement learning (RL) models, including advanced Q-learning (QRL) and RL-based state–action–reward–state–action (SARSA) models, are proposed to enhance the scheduling performance of FJSPs, in order to reduce the total makespan. To more accurately depict the problem realities, two categories of simulated single-machine job shops and multi-machine job shops, as well as the scheduling of a furnace model, are used to compare the learning impact and performance of the novel RL models to other algorithms. FJSPs are challenging to resolve and are considered non-deterministic polynomial-time hardness (NP-hard) problems. Numerous algorithms have been used previously to solve FJSPs. However, because their key parameters cannot be effectively changed dynamically throughout the computation process, the effectiveness and quality of the solutions fail to meet production standards. Consequently, in this research, developed RL models are presented. The efficacy and benefits of the suggested SARSA method for solving FJSPs are shown by extensive computer testing and comparisons. As a result, this can be a competitive algorithm for FJSPs. Full article
(This article belongs to the Special Issue Application of Machine Learning and Intelligent Systems)
Show Figures

Figure 1

25 pages, 5498 KiB  
Article
Reinforcement Learning Algorithms for Autonomous Mission Accomplishment by Unmanned Aerial Vehicles: A Comparative View with DQN, SARSA and A2C
by Gonzalo Aguilar Jiménez, Arturo de la Escalera Hueso and Maria J. Gómez-Silva
Sensors 2023, 23(21), 9013; https://doi.org/10.3390/s23219013 - 6 Nov 2023
Cited by 8 | Viewed by 2903
Abstract
Unmanned aerial vehicles (UAV) can be controlled in diverse ways. One of the most common is through artificial intelligence (AI), which comprises different methods, such as reinforcement learning (RL). The article aims to provide a comparison of three RL algorithms—DQN as the benchmark, [...] Read more.
Unmanned aerial vehicles (UAV) can be controlled in diverse ways. One of the most common is through artificial intelligence (AI), which comprises different methods, such as reinforcement learning (RL). The article aims to provide a comparison of three RL algorithms—DQN as the benchmark, SARSA as a same-family algorithm, and A2C as a different-structure one—to address the problem of a UAV navigating from departure point A to endpoint B while avoiding obstacles and, simultaneously, using the least possible time and flying the shortest distance. Under fixed premises, this investigation provides the results of the performances obtained for this activity. A neighborhood environment was selected because it is likely one of the most common areas of use for commercial drones. Taking DQN as the benchmark and not having previous knowledge of the behavior of SARSA or A2C in the employed environment, the comparison outcomes showed that DQN was the only one achieving the target. At the same time, SARSA and A2C did not. However, a deeper analysis of the results led to the conclusion that a fine-tuning of A2C could overcome the performance of DQN under certain conditions, demonstrating a greater speed at maximum finding with a more straightforward structure. Full article
(This article belongs to the Special Issue Design, Communication, and Control of Autonomous Vehicle Systems)
Show Figures

Figure 1

20 pages, 3218 KiB  
Article
Multi-Agent Optimal Control for Central Chiller Plants Using Reinforcement Learning and Game Theory
by Shunian Qiu, Zhenhai Li, Zhihong Pang, Zhengwei Li and Yinying Tao
Systems 2023, 11(3), 136; https://doi.org/10.3390/systems11030136 - 3 Mar 2023
Cited by 7 | Viewed by 3784
Abstract
To conserve building energy, optimal operation of a building’s energy systems, especially heating, ventilation and air-conditioning (HVAC) systems, is important. This study focuses on the optimization of the central chiller plant, which accounts for a large portion of the HVAC system’s energy consumption. [...] Read more.
To conserve building energy, optimal operation of a building’s energy systems, especially heating, ventilation and air-conditioning (HVAC) systems, is important. This study focuses on the optimization of the central chiller plant, which accounts for a large portion of the HVAC system’s energy consumption. Classic optimal control methods for central chiller plants are mostly based on system performance models which takes much effort and cost to establish. In addition, inevitable model error could cause control risk to the applied system. To mitigate the model dependency of HVAC optimal control, reinforcement learning (RL) algorithms have been drawing attention in the HVAC control domain due to its model-free feature. Currently, the RL-based optimization of central chiller plants faces several challenges: (1) existing model-free control methods based on RL typically adopt single-agent scheme, which brings high training cost and long training period when optimizing multiple controllable variables for large-scaled systems; (2) multi-agent scheme could overcome the former problem, but it also requires a proper coordination mechanism to harmonize the potential conflicts among all involved RL agents; (3) previous agent coordination frameworks (identified by distributed control or decentralized control) are mainly designed for model-based control methods instead of model-free controllers. To tackle the problems above, this article proposes a multi-agent, model-free optimal control approach for central chiller plants. This approach utilizes game theory and the RL algorithm SARSA for agent coordination and learning, respectively. A data-driven system model is set up using measured field data of a real HVAC system for simulation. The simulation case study results suggest that the energy saving performance (both short- and long-term) of the proposed approach (over 10% in a cooling season compared to the rule-based baseline controller) is close to the classic multi-agent reinforcement learning (MARL) algorithm WoLF-PHC; moreover, the proposed approach’s nature of few pending parameters makes it more feasible and robust for engineering practices than the WoLF-PHC algorithm. Full article
Show Figures

Figure 1

14 pages, 1750 KiB  
Article
Security Analysis of Cyber-Physical Systems Using Reinforcement Learning
by Mariam Ibrahim and Ruba Elhafiz
Sensors 2023, 23(3), 1634; https://doi.org/10.3390/s23031634 - 2 Feb 2023
Cited by 13 | Viewed by 4031
Abstract
Future engineering systems with new capabilities that far exceed today’s levels of autonomy, functionality, usability, dependability, and cyber security are predicted to be designed and developed using cyber-physical systems (CPSs). In this paper, the security of CPSs is investigated through a case study [...] Read more.
Future engineering systems with new capabilities that far exceed today’s levels of autonomy, functionality, usability, dependability, and cyber security are predicted to be designed and developed using cyber-physical systems (CPSs). In this paper, the security of CPSs is investigated through a case study of a smart grid by using a reinforcement learning (RL) augmented attack graph to effectively highlight the subsystems’ weaknesses. In particular, the state action reward state action (SARSA) RL technique is used, in which the agent is taken to be the attacker, and an attack graph created for the system is built to resemble the environment. SARSA uses rewards and penalties to identify the worst-case attack scenario; with the most cumulative reward, an attacker may carry out the most harm to the system with the fewest available actions. Results showed successfully the worst-case attack scenario with a total reward of 26.9 and identified the most severely damaged subsystems. Full article
Show Figures

Figure 1

28 pages, 3576 KiB  
Article
A Study on the Impact of Integrating Reinforcement Learning for Channel Prediction and Power Allocation Scheme in MISO-NOMA System
by Mohamed Gaballa, Maysam Abbod and Ammar Aldallal
Sensors 2023, 23(3), 1383; https://doi.org/10.3390/s23031383 - 26 Jan 2023
Cited by 7 | Viewed by 3604
Abstract
In this study, the influence of adopting Reinforcement Learning (RL) to predict the channel parameters for user devices in a Power Domain Multi-Input Single-Output Non-Orthogonal Multiple Access (MISO-NOMA) system is inspected. In the channel prediction-based RL approach, the Q-learning algorithm is developed and [...] Read more.
In this study, the influence of adopting Reinforcement Learning (RL) to predict the channel parameters for user devices in a Power Domain Multi-Input Single-Output Non-Orthogonal Multiple Access (MISO-NOMA) system is inspected. In the channel prediction-based RL approach, the Q-learning algorithm is developed and incorporated into the NOMA system so that the developed Q-model can be employed to predict the channel coefficients for every user device. The purpose of adopting the developed Q-learning procedure is to maximize the received downlink sum-rate and decrease the estimation loss. To satisfy this aim, the developed Q-algorithm is initialized using different channel statistics and then the algorithm is updated based on the interaction with the environment in order to approximate the channel coefficients for each device. The predicted parameters are utilized at the receiver side to recover the desired data. Furthermore, based on maximizing the sum-rate of the examined user devices, the power factors for each user can be deduced analytically to allocate the optimal power factor for every user device in the system. In addition, this work inspects how the channel prediction based on the developed Q-learning model, and the power allocation policy, can both be incorporated for the purpose of multiuser recognition in the examined MISO-NOMA system. Simulation results, based on several performance metrics, have demonstrated that the developed Q-learning algorithm can be a competitive algorithm for channel estimation when compared to different benchmark schemes such as deep learning-based long short-term memory (LSTM), RL based actor-critic algorithm, RL based state-action-reward-state-action (SARSA) algorithm, and standard channel estimation scheme based on minimum mean square error procedure. Full article
(This article belongs to the Topic Machine Learning in Communication Systems and Networks)
Show Figures

Figure 1

17 pages, 3480 KiB  
Article
A Hybrid Reinforcement Learning Algorithm for 2D Irregular Packing Problems
by Jie Fang, Yunqing Rao, Xusheng Zhao and Bing Du
Mathematics 2023, 11(2), 327; https://doi.org/10.3390/math11020327 - 8 Jan 2023
Cited by 28 | Viewed by 8610
Abstract
Packing problems, also known as nesting problems or bin packing problems, are classic and popular NP-hard problems with high computational complexity. Inspired by classic reinforcement learning (RL), we established a mathematical model for two-dimensional (2D) irregular-piece packing combined with characteristics of 2D irregular [...] Read more.
Packing problems, also known as nesting problems or bin packing problems, are classic and popular NP-hard problems with high computational complexity. Inspired by classic reinforcement learning (RL), we established a mathematical model for two-dimensional (2D) irregular-piece packing combined with characteristics of 2D irregular pieces. An RL algorithm based on Monte Carlo learning (MC), Q-learning, and Sarsa-learning is proposed in this paper to solve a 2D irregular-piece packing problem. Additionally, mechanisms of reward–return and strategy-update based on piece packing were designed. Finally, the standard test case of irregular pieces was used for experimental testing to analyze the optimization effect of the algorithm. The experimental results show that the proposed algorithm can successfully realize packing of 2D irregular pieces. A similar or better optimization effect can be obtained compared to some classical heuristic algorithms. The proposed algorithm is an early attempt to use machine learning to solve 2D irregular packing problems. On the one hand, our hybrid RL algorithm can provide a basis for subsequent deep reinforcement learning (DRL) to solve packing problems, which has far-reaching theoretical significance. On the other hand, it has practical significance for improving the utilization rate of raw materials and broadening the application field of machine learning. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

17 pages, 1303 KiB  
Article
Performance Analysis of Reinforcement Learning Techniques for Augmented Experience Training Using Generative Adversarial Networks
by Smita Mahajan, Shruti Patil, Moinuddin Bhavnagri, Rashmi Singh, Kshitiz Kalra, Bhumika Saini, Ketan Kotecha and Jatinderkumar Saini
Appl. Sci. 2022, 12(24), 12923; https://doi.org/10.3390/app122412923 - 16 Dec 2022
Viewed by 3041
Abstract
This paper aims at analyzing the performance of reinforcement learning (RL) agents when trained in environments created by a generative adversarial network (GAN). This is a first step towards the greater goal of developing fast-learning and robust RL agents by leveraging the power [...] Read more.
This paper aims at analyzing the performance of reinforcement learning (RL) agents when trained in environments created by a generative adversarial network (GAN). This is a first step towards the greater goal of developing fast-learning and robust RL agents by leveraging the power of GANs for environment generation. The RL techniques that we tested were exact Q-learning, approximate Q-learning, approximate SARSA and a heuristic agent. The task for the agents was to learn how to play the game Super Mario Bros (SMB). This analysis will be helpful in suggesting which RL techniques are best suited for augmented experience training (with synthetic environments). This would further help in establishing a reinforcement learning framework using the agents that can learn faster by bringing a greater variety in environment exploration. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence and Machine Learning in Games)
Show Figures

Figure 1

26 pages, 9699 KiB  
Article
A Vision-Based Bio-Inspired Reinforcement Learning Algorithms for Manipulator Obstacle Avoidance
by Abhilasha Singh, Mohamed Shakeel, V. Kalaichelvi and R. Karthikeyan
Electronics 2022, 11(21), 3636; https://doi.org/10.3390/electronics11213636 - 7 Nov 2022
Cited by 4 | Viewed by 3550
Abstract
Path planning for robotic manipulators has proven to be a challenging issue in industrial applications. Despite providing precise waypoints, the traditional path planning algorithm requires a predefined map and is ineffective in complex, unknown environments. Reinforcement learning techniques can be used in cases [...] Read more.
Path planning for robotic manipulators has proven to be a challenging issue in industrial applications. Despite providing precise waypoints, the traditional path planning algorithm requires a predefined map and is ineffective in complex, unknown environments. Reinforcement learning techniques can be used in cases where there is a no environmental map. For vision-based path planning and obstacle avoidance in assembly line operations, this study introduces various Reinforcement Learning (RL) algorithms based on discrete state-action space, such as Q-Learning, Deep Q Network (DQN), State-Action-Reward- State-Action (SARSA), and Double Deep Q Network (DDQN). By positioning the camera in an eye-to-hand position, this work used color-based segmentation to identify the locations of obstacles, start, and goal points. The homogeneous transformation technique was used to further convert the pixel values into robot coordinates. Furthermore, by adjusting the number of episodes, steps per episode, learning rate, and discount factor, a performance study of several RL algorithms was carried out. To further tune the training hyperparameters, genetic algorithms (GA) and particle swarm optimization (PSO) were employed. The length of the path travelled, the average reward, the average number of steps, and the time required to reach the objective point were all measured and compared for each of the test cases. Finally, the suggested methodology was evaluated using a live camera that recorded the robot workspace in real-time. The ideal path was then drawn using a TAL BRABO 5 DOF manipulator. It was concluded that waypoints obtained via Double DQN showed an improved performance and were able to avoid the obstacles and reach the goal point smoothly and efficiently. Full article
Show Figures

Figure 1

19 pages, 1973 KiB  
Article
Secure State Estimation of Cyber-Physical System under Cyber Attacks: Q-Learning vs. SARSA
by Zengwang Jin, Menglu Ma, Shuting Zhang, Yanyan Hu, Yanning Zhang and Changyin Sun
Electronics 2022, 11(19), 3161; https://doi.org/10.3390/electronics11193161 - 1 Oct 2022
Cited by 10 | Viewed by 2827
Abstract
This paper proposes a reinforcement learning (RL) algorithm for the security problem of state estimation of cyber-physical system (CPS) under denial-of-service (DoS) attacks. The security of CPS will inevitably decline when faced with malicious cyber attacks. In order to analyze the impact of [...] Read more.
This paper proposes a reinforcement learning (RL) algorithm for the security problem of state estimation of cyber-physical system (CPS) under denial-of-service (DoS) attacks. The security of CPS will inevitably decline when faced with malicious cyber attacks. In order to analyze the impact of cyber attacks on CPS performance, a Kalman filter, as an adaptive state estimation technology, is combined with an RL method to evaluate the issue of system security, where estimation performance is adopted as an evaluation criterion. Then, the transition of estimation error covariance under a DoS attack is described as a Markov decision process, and the RL algorithm could be applied to resolve the optimal countermeasures. Meanwhile, the interactive combat between defender and attacker could be regarded as a two-player zero-sum game, where the Nash equilibrium policy exists but needs to be solved. Considering the energy constraints, the action selection of both sides will be restricted by setting certain cost functions. The proposed RL approach is designed from three different perspectives, including the defender, the attacker and the interactive game of two opposite sides. In addition, the framework of Q-learning and state–action–reward–state–action (SARSA) methods are investigated separately in this paper to analyze the influence of different RL algorithms. The results show that both algorithms obtain the corresponding optimal policy and the Nash equilibrium policy of the zero-sum interactive game. Through comparative analysis of two algorithms, it is verified that the differences between Q-Learning and SARSA could be applied effectively into the secure state estimation in CPS. Full article
Show Figures

Figure 1

25 pages, 1580 KiB  
Article
Learning-Based Online QoE Optimization in Multi-Agent Video Streaming
by Yimeng Wang, Mridul Agarwal, Tian Lan and Vaneet Aggarwal
Algorithms 2022, 15(7), 227; https://doi.org/10.3390/a15070227 - 28 Jun 2022
Cited by 8 | Viewed by 3109
Abstract
Video streaming has become a major usage scenario for the Internet. The growing popularity of new applications, such as 4K and 360-degree videos, mandates that network resources must be carefully apportioned among different users in order to achieve the optimal Quality of Experience [...] Read more.
Video streaming has become a major usage scenario for the Internet. The growing popularity of new applications, such as 4K and 360-degree videos, mandates that network resources must be carefully apportioned among different users in order to achieve the optimal Quality of Experience (QoE) and fairness objectives. This results in a challenging online optimization problem, as networks grow increasingly complex and the relevant QoE objectives are often nonlinear functions. Recently, data-driven approaches, deep Reinforcement Learning (RL) in particular, have been successfully applied to network optimization problems by modeling them as Markov decision processes. However, existing RL algorithms involving multiple agents fail to address nonlinear objective functions on different agents’ rewards. To this end, we leverage MAPG-finite, a policy gradient algorithm designed for multi-agent learning problems with nonlinear objectives. It allows us to optimize bandwidth distributions among multiple agents and to maximize QoE and fairness objectives on video streaming rewards. Implementing the proposed algorithm, we compare the MAPG-finite strategy with a number of baselines, including static, adaptive, and single-agent learning policies. The numerical results show that MAPG-finite significantly outperforms the baseline strategies with respect to different objective functions and in various settings, including both constant and adaptive bitrate videos. Specifically, our MAPG-finite algorithm maximizes QoE by 15.27% and maximizes fairness by 22.47% compared to the standard SARSA algorithm for a 2000 KB/s link. Full article
(This article belongs to the Special Issue Deep Learning for Internet of Things)
Show Figures

Figure 1

16 pages, 2442 KiB  
Article
Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning
by Abhinav Sharma, Ruchir Gupta, K. Lakshmanan and Atul Gupta
Symmetry 2021, 13(7), 1197; https://doi.org/10.3390/sym13071197 - 2 Jul 2021
Cited by 5 | Viewed by 3043
Abstract
Reinforcement Learning (RL) enables an agent to learn control policies for achieving its long-term goals. One key parameter of RL algorithms is a discount factor that scales down future cost in the state’s current value estimate. This study introduces and analyses a transition-based [...] Read more.
Reinforcement Learning (RL) enables an agent to learn control policies for achieving its long-term goals. One key parameter of RL algorithms is a discount factor that scales down future cost in the state’s current value estimate. This study introduces and analyses a transition-based discount factor in two model-free reinforcement learning algorithms: Q-learning and SARSA, and shows their convergence using the theory of stochastic approximation for finite state and action spaces. This causes an asymmetric discounting, favouring some transitions over others, which allows (1) faster convergence than constant discount factor variant of these algorithms, which is demonstrated by experiments on the Taxi domain and MountainCar environments; (2) provides better control over the RL agents to learn risk-averse or risk-taking policy, as demonstrated in a Cliff Walking experiment. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Back to TopTop