Next Article in Journal
Sleep Apnea Detection with Polysomnography and Depth Sensors
Next Article in Special Issue
Determining the Optimal Restricted Driving Zone Using Genetic Algorithm in a Smart City
Previous Article in Journal
MuPeG—The Multiple Person Gait Framework
Open AccessArticle

Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices

1
Department of Interdisciplinary Program in Creative Engineering, Korea University of Technology and Education, Cheonan 31253, Korea
2
Department of Computer Science Engineering, Korea University of Technology and Education, Cheonan 31253, Korea
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(5), 1359; https://doi.org/10.3390/s20051359
Received: 12 February 2020 / Revised: 25 February 2020 / Accepted: 28 February 2020 / Published: 2 March 2020
(This article belongs to the Special Issue Machine Learning for IoT Applications and Digital Twins)
Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved. View Full-Text
Keywords: Actor–Critic PPO; federated reinforcement learning; multi-device control Actor–Critic PPO; federated reinforcement learning; multi-device control
Show Figures

Figure 1

MDPI and ACS Style

Lim, H.-K.; Kim, J.-B.; Heo, J.-S.; Han, Y.-H. Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices. Sensors 2020, 20, 1359. https://doi.org/10.3390/s20051359

AMA Style

Lim H-K, Kim J-B, Heo J-S, Han Y-H. Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices. Sensors. 2020; 20(5):1359. https://doi.org/10.3390/s20051359

Chicago/Turabian Style

Lim, Hyun-Kyo; Kim, Ju-Bong; Heo, Joo-Seong; Han, Youn-Hee. 2020. "Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices" Sensors 20, no. 5: 1359. https://doi.org/10.3390/s20051359

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop