Deep Reinforcement Learning: Methods and Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (31 October 2020) | Viewed by 27283

Special Issue Editors


E-Mail Website
Guest Editor
School of Information Technology, Deakin University, Burwood, VIC 3125, Australia
Interests: artificial intelligence; deep learning; deep reinforcement learning; data science; big data; cybersecurity; IoT; image processing; robotics; autonomous vehicles; multiagent systems; human–machine integration; defence technologies
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Federation Learning Agents Group, Federation University Australia, Mount Helen 3350, Australia
Interests: multi-objective reinforcement learning; applications of reinforcement learning; AI safety; AI ethics

Special Issue Information

Dear Colleagues,

Real-world problems are increasingly complex, and applications of traditional reinforcement learning (RL) methods to solve these problems are becoming more and more challenging. Fortunately, deep learning has emerged as a powerful tool, and with the great capability of function approximation and representation learning, it is an excellent complement to traditional RL methods. The combination of deep learning and RL, namely deep RL, has made breakthroughs in developing artificial agents that can perform at human-level. Deep RL methods have been able to solve many complex problems in different domains from video games (e.g., Atari games, the game of Go, the real-time strategy game StarCraft II, the 3D multiplayer game Capture the Flag in Quake III Arena, and the teamwork game Dota 2) to real-world applications such as robotics, autonomous vehicles, autonomous surgery, biological data mining, drug design, cybersecurity, and the internet of things.

This Special Issue focuses on methods and applications of deep RL. We would like to invite papers proposing advanced deep RL methods and/or their novel applications to solve complex problems in various domains.

Dr. Thanh Thi Nguyen
Assoc. Prof. Peter Vamplew
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • reinforcement learning
  • deep learning
  • Deep Q-network
  • multiagent RL
  • multiobjective RL
  • autonomous vehicles
  • autonomy
  • robotics

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

14 pages, 755 KiB  
Article
A Deep Reinforcement Learning-Based Power Resource Management for Fuel Cell Powered Data Centers
by Xiaoxuan Hu and Yanfei Sun
Electronics 2020, 9(12), 2054; https://doi.org/10.3390/electronics9122054 - 03 Dec 2020
Cited by 3 | Viewed by 1990
Abstract
With the increase of data storage demands, the energy consumption of data centers is also increasing. Energy saving and use of power resources are two key problems to be solved. In this paper, we introduce the fuel cells as the energy supply and [...] Read more.
With the increase of data storage demands, the energy consumption of data centers is also increasing. Energy saving and use of power resources are two key problems to be solved. In this paper, we introduce the fuel cells as the energy supply and study power resource use in data center power grids. By considering the limited load following of fuel cells and power budget fragmentation phenomenon, we transform the main two objectives into the optimization of workload distribution problem and use a deep reinforcement learning-based method to solve it. The evaluations with real-world traces demonstrate the better performance of this work over state-of-art approaches. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning: Methods and Applications)
Show Figures

Figure 1

18 pages, 1972 KiB  
Article
Episodic Self-Imitation Learning with Hindsight
by Tianhong Dai, Hengyan Liu and Anil Anthony Bharath
Electronics 2020, 9(10), 1742; https://doi.org/10.3390/electronics9101742 - 21 Oct 2020
Cited by 10 | Viewed by 3349
Abstract
Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state–action pairs from the experience replay buffer, our agent [...] Read more.
Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state–action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms in several simulated robot control tasks. The trajectory selection module is shown to prevent the agent learning undesirable hindsight experiences. With the capability of solving sparse reward problems in continuous control settings, episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces, such as robot guidance and manipulation. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning: Methods and Applications)
Show Figures

Figure 1

21 pages, 5405 KiB  
Article
Research and Implementation of Intelligent Decision Based on a Priori Knowledge and DQN Algorithms in Wargame Environment
by Yuxiang Sun, Bo Yuan, Tao Zhang, Bojian Tang, Wanwen Zheng and Xianzhong Zhou
Electronics 2020, 9(10), 1668; https://doi.org/10.3390/electronics9101668 - 13 Oct 2020
Cited by 12 | Viewed by 3241
Abstract
The reinforcement learning problem of complex action control in a multi-player wargame has been a hot research topic in recent years. In this paper, a game system based on turn-based confrontation is designed and implemented with state-of-the-art deep reinforcement learning models. Specifically, we [...] Read more.
The reinforcement learning problem of complex action control in a multi-player wargame has been a hot research topic in recent years. In this paper, a game system based on turn-based confrontation is designed and implemented with state-of-the-art deep reinforcement learning models. Specifically, we first design a Q-learning algorithm to achieve intelligent decision-making, which is based on the DQN (Deep Q Network) to model complex game behaviors. Then, an a priori knowledge-based algorithm PK-DQN (Prior Knowledge-Deep Q Network) is introduced to improve the DQN algorithm, which accelerates the convergence speed and stability of the algorithm. The experiments demonstrate the correctness of the PK-DQN algorithm, it is validated, and its performance surpasses the conventional DQN algorithm. Furthermore, the PK-DQN algorithm shows effectiveness in defeating the high level of rule-based opponents, which provides promising results for the exploration of the field of smart chess and intelligent game deduction. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning: Methods and Applications)
Show Figures

Figure 1

13 pages, 1256 KiB  
Article
Using Data Augmentation Based Reinforcement Learning for Daily Stock Trading
by Yuyu Yuan, Wen Wen and Jincui Yang
Electronics 2020, 9(9), 1384; https://doi.org/10.3390/electronics9091384 - 27 Aug 2020
Cited by 13 | Viewed by 4732
Abstract
In algorithmic trading, adequate training data set is key to making profits. However, stock trading data in units of a day can not meet the great demand for reinforcement learning. To address this problem, we proposed a framework named data augmentation based reinforcement [...] Read more.
In algorithmic trading, adequate training data set is key to making profits. However, stock trading data in units of a day can not meet the great demand for reinforcement learning. To address this problem, we proposed a framework named data augmentation based reinforcement learning (DARL) which uses minute-candle data (open, high, low, close) to train the agent. The agent is then used to guide daily stock trading. In this way, we can increase the instances of data available for training in hundreds of folds, which can substantially improve the reinforcement learning effect. But not all stocks are suitable for this kind of trading. Therefore, we propose an access mechanism based on skewness and kurtosis to select stocks that can be traded properly using this algorithm. In our experiment, we find proximal policy optimization (PPO) is the most stable algorithm to achieve high risk-adjusted returns. Deep Q-learning (DQN) and soft actor critic (SAC) can beat the market in Sharp Ratio. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning: Methods and Applications)
Show Figures

Figure 1

Review

Jump to: Research

21 pages, 1377 KiB  
Review
A Survey of Multi-Task Deep Reinforcement Learning
by Nelson Vithayathil Varghese and Qusay H. Mahmoud
Electronics 2020, 9(9), 1363; https://doi.org/10.3390/electronics9091363 - 22 Aug 2020
Cited by 82 | Viewed by 12469
Abstract
Driven by the recent technological advancements within the field of artificial intelligence research, deep learning has emerged as a promising representation learning technique across all of the machine learning classes, especially within the reinforcement learning arena. This new direction has given rise to [...] Read more.
Driven by the recent technological advancements within the field of artificial intelligence research, deep learning has emerged as a promising representation learning technique across all of the machine learning classes, especially within the reinforcement learning arena. This new direction has given rise to the evolution of a new technological domain named deep reinforcement learning, which combines the representational learning power of deep learning with existing reinforcement learning methods. Undoubtedly, the inception of deep reinforcement learning has played a vital role in optimizing the performance of reinforcement learning-based intelligent agents with model-free based approaches. Although these methods could improve the performance of agents to a greater extent, they were mainly limited to systems that adopted reinforcement learning algorithms focused on learning a single task. At the same moment, the aforementioned approach was found to be relatively data-inefficient, particularly when reinforcement learning agents needed to interact with more complex and rich data environments. This is primarily due to the limited applicability of deep reinforcement learning algorithms to many scenarios across related tasks from the same environment. The objective of this paper is to survey the research challenges associated with multi-tasking within the deep reinforcement arena and present the state-of-the-art approaches by comparing and contrasting recent solutions, namely DISTRAL (DIStill & TRAnsfer Learning), IMPALA(Importance Weighted Actor-Learner Architecture) and PopArt that aim to address core challenges such as scalability, distraction dilemma, partial observability, catastrophic forgetting and negative knowledge transfer. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning: Methods and Applications)
Show Figures

Figure 1

Back to TopTop