Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (25)

Search Parameters:
Keywords = Hindsight Experience Replay (HER)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 42651 KiB  
Article
Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*
by Qiaoyu Xu and Yansong Lin
Sensors 2025, 25(9), 2654; https://doi.org/10.3390/s25092654 - 22 Apr 2025
Cited by 1 | Viewed by 674
Abstract
In addressing the optimal motion planning issue for multi-arm rock drilling robots, this paper introduces a high-precision motion planning method based on Multi-Strategy Sampling RRT* (MSS-RRT*). A dual Jacobi iterative inverse solution method, coupled with a forward kinematics error compensation model, is introduced [...] Read more.
In addressing the optimal motion planning issue for multi-arm rock drilling robots, this paper introduces a high-precision motion planning method based on Multi-Strategy Sampling RRT* (MSS-RRT*). A dual Jacobi iterative inverse solution method, coupled with a forward kinematics error compensation model, is introduced to dynamically correct target positions, improving end-effector positioning accuracy. A multi-strategy sampling mechanism is constructed by integrating DRL position sphere sampling, spatial random sampling, and goal-oriented sampling. This mechanism flexibly applies three sampling methods at different stages of path planning, significantly improving the adaptability and search efficiency of the RRT* algorithm. In particular, DRL position sphere sampling is prioritized during the initial phase, effectively reducing the number of invalid sampling points. For training a three-arm DRL model with the twin delayed deep deterministic policy gradient algorithm (TD3), the Hindsight Experience Replay-Obstacle Arm Transfer (HER-OAT) method is used for data replay. The cylindrical bounding box method effectively prevents collisions between arms. The experimental results show that the proposed method improves motion planning accuracy by 94.15% compared to a single Jacobi iteration. MSS-RRT* can plan a superior path in a shorter duration, with the planning time under optimal path conditions being only 20.71% of that required by Informed-RRT*, and with the path length reduced by 21.58% compared to Quick-RRT* under the same time constraints. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

21 pages, 4799 KiB  
Article
Data-Efficient Reinforcement Learning Framework for Autonomous Flight Based on Real-World Flight Data
by Uicheon Lee, Seonah Lee and Kyonghoon Kim
Drones 2025, 9(4), 264; https://doi.org/10.3390/drones9040264 - 31 Mar 2025
Viewed by 936
Abstract
Recently, autonomous flight has emerged as a key technology in the aerospace and defense sectors; however, traditional code-based autonomous flight systems face limitations in complex environments. Although reinforcement learning offers an alternative, its practical application in real-world settings is hindered by the substantial [...] Read more.
Recently, autonomous flight has emerged as a key technology in the aerospace and defense sectors; however, traditional code-based autonomous flight systems face limitations in complex environments. Although reinforcement learning offers an alternative, its practical application in real-world settings is hindered by the substantial data requirements. In this study, we develop a framework that integrates a Generative Adversarial Network (GAN) and Hindsight Experience Replay (HER) into model-based reinforcement learning to enhance data efficiency and accuracy. We compared the proposed framework against existing algorithms in actual quadcopter control. In the comparative experiment, we demonstrated an improvement of up to 70.59% in learning speed, clearly highlighting the impact of the environmental model. To the best of our knowledge, this study is the first where a GAN and HER are combined with model-based reinforcement learning, and it is expected to contribute significantly to the practical application of reinforcement learning in autonomous flight. Full article
Show Figures

Figure 1

19 pages, 10296 KiB  
Article
Extended Maximum Actor–Critic Framework Based on Policy Gradient Reinforcement for System Optimization
by Jung-Hyun Kim, Yong-Hoon Choi, You-Rak Choi, Jae-Hyeok Jeong and Min-Suk Kim
Appl. Sci. 2025, 15(4), 1828; https://doi.org/10.3390/app15041828 - 11 Feb 2025
Viewed by 833
Abstract
Recently, significant research efforts have been directed toward leveraging Artificial Intelligence for sensor data processing and system control. In particular, it is essential to determine the optimal path and trajectory by calculating sensor data for effective control systems. For instance, model-predictive control based [...] Read more.
Recently, significant research efforts have been directed toward leveraging Artificial Intelligence for sensor data processing and system control. In particular, it is essential to determine the optimal path and trajectory by calculating sensor data for effective control systems. For instance, model-predictive control based on Proportional-Integral-Derivative models is intuitive, efficient, and provides outstanding control performance. However, challenges in tracking persist, which requires active research and development to integrate and optimize the control system in terms of Machine Learning. Specifically, Reinforcement Learning, a branch of Machine Learning, has been used in several research fields to solve optimal control problems. In this paper, we propose an Extended Maximum Actor–Critic using a Reinforcement Learning-based method to combine the advantages of both value and policy to enhance the learning stability of actor–critic for optimization of system control. The proposed method integrates the actor and the maximized actor in the learning process to evaluate and identify actions with the highest value, facilitating effective learning exploration. Additionally, to enhance the efficiency and robustness of the agent learning process, we propose Prioritized Hindsight Experience Replay, combining the advantages of Prioritized Experience Replay and Hindsight Experience Replay. To verify this, we performed evaluations and experiments to examine the improved training stability in the MuJoCo environment, which is a simulator based on Reinforcement Learning. The proposed Prioritized Hindsight Experience Replay method significantly enhances the experience to be compared with the standard replay buffer and PER in experimental simulators, such as the simple HalfCheetah-v4 and the complex Ant-v4. Thus, Prioritized Hindsight Experience Replay achieves a higher success rate than PER in FetchReach-v2, demonstrating the significant effectiveness of our proposed method in more complex reward environments. Full article
Show Figures

Figure 1

23 pages, 4137 KiB  
Article
Mars Exploration: Research on Goal-Driven Hierarchical DQN Autonomous Scene Exploration Algorithm
by Zhiguo Zhou, Ying Chen, Jiabao Yu, Bowen Zu, Qian Wang, Xuehua Zhou and Junwei Duan
Aerospace 2024, 11(8), 692; https://doi.org/10.3390/aerospace11080692 - 22 Aug 2024
Cited by 1 | Viewed by 1765
Abstract
In the non-deterministic, large-scale navigation environment under the Mars exploration mission, there is a large space for action and many environmental states. Traditional reinforcement learning algorithms that can only obtain rewards at target points and obstacles will encounter the problems of reward sparsity [...] Read more.
In the non-deterministic, large-scale navigation environment under the Mars exploration mission, there is a large space for action and many environmental states. Traditional reinforcement learning algorithms that can only obtain rewards at target points and obstacles will encounter the problems of reward sparsity and dimension explosion, making the training speed too slow or even impossible. This work proposes a deep layered learning algorithm based on the goal-driven layered deep Q-network (GDH-DQN), which is more suitable for mobile robots to explore, navigate, and avoid obstacles without a map. The algorithm model is designed in two layers. The lower layer provides behavioral strategies to achieve short-term goals, and the upper layer provides selection strategies for multiple short-term goals. Use known position nodes as short-term goals to guide the mobile robot forward and achieve long-term obstacle avoidance goals. Hierarchical execution not only simplifies tasks but also effectively solves the problems of reward sparsity and dimensionality explosion. In addition, each layer of the algorithm integrates a Hindsight Experience Replay mechanism to improve performance, make full use of the goal-driven function of the node, and effectively avoid the possibility of misleading the agent by complex processes and reward function design blind spots. The agent adjusts the number of model layers according to the number of short-term goals, further improving the efficiency and adaptability of the algorithm. Experimental results show that, compared with the hierarchical DQN method, the navigation success rate of the GDH-DQN algorithm is significantly improved, and it is more suitable for unknown scenarios such as Mars exploration. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

29 pages, 4569 KiB  
Article
Energy-Aware Hierarchical Reinforcement Learning Based on the Predictive Energy Consumption Algorithm for Search and Rescue Aerial Robots in Unknown Environments
by M. Ramezani and M. A. Amiri Atashgah
Drones 2024, 8(7), 283; https://doi.org/10.3390/drones8070283 - 23 Jun 2024
Cited by 7 | Viewed by 2311
Abstract
Aerial robots (drones) offer critical advantages in missions where human participation is impeded due to hazardous conditions. Among these, search and rescue missions in disaster-stricken areas are particularly challenging due to the dynamic and unpredictable nature of the environment, often compounded by the [...] Read more.
Aerial robots (drones) offer critical advantages in missions where human participation is impeded due to hazardous conditions. Among these, search and rescue missions in disaster-stricken areas are particularly challenging due to the dynamic and unpredictable nature of the environment, often compounded by the lack of reliable environmental models and limited ground system communication. In such scenarios, autonomous aerial robots’ operation becomes essential. This paper introduces a novel hierarchical reinforcement learning-based algorithm to address the critical limitation of the aerial robot’s battery life. Central to our approach is the integration of a long short-term memory (LSTM) model, designed for precise battery consumption prediction. This model is incorporated into our HRL framework, empowering a high-level controller to set feasible and energy-efficient goals for a low-level controller. By optimizing battery usage, our algorithm enhances the aerial robot’s ability to deliver rescue packs to multiple survivors without the frequent need for recharging. Furthermore, we augment our HRL approach with hindsight experience replay at the low level to improve its sample efficiency. Full article
Show Figures

Figure 1

19 pages, 1961 KiB  
Article
Biped Robots Control in Gusty Environments with Adaptive Exploration Based DDPG
by Yilin Zhang, Huimin Sun, Honglin Sun, Yuan Huang and Kenji Hashimoto
Biomimetics 2024, 9(6), 346; https://doi.org/10.3390/biomimetics9060346 - 8 Jun 2024
Cited by 1 | Viewed by 2105
Abstract
As technology rapidly evolves, the application of bipedal robots in various environments has widely expanded. These robots, compared to their wheeled counterparts, exhibit a greater degree of freedom and a higher complexity in control, making the challenge of maintaining balance and stability under [...] Read more.
As technology rapidly evolves, the application of bipedal robots in various environments has widely expanded. These robots, compared to their wheeled counterparts, exhibit a greater degree of freedom and a higher complexity in control, making the challenge of maintaining balance and stability under changing wind speeds particularly intricate. Overcoming this challenge is critical as it enables bipedal robots to sustain more stable gaits during outdoor tasks, thereby increasing safety and enhancing operational efficiency in outdoor settings. To transcend the constraints of existing methodologies, this research introduces an adaptive bio-inspired exploration framework for bipedal robots facing wind disturbances, which is based on the Deep Deterministic Policy Gradient (DDPG) approach. This framework allows the robots to perceive their bodily states through wind force inputs and adaptively modify their exploration coefficients. Additionally, to address the convergence challenges posed by sparse rewards, this study incorporates Hindsight Experience Replay (HER) and a reward-reshaping strategy to provide safer and more effective training guidance for the agents. Simulation outcomes reveal that robots utilizing this advanced method can more swiftly explore behaviors that contribute to stability in complex conditions, and demonstrate improvements in training speed and walking distance over traditional DDPG algorithms. Full article
(This article belongs to the Special Issue Design and Control of a Bio-Inspired Robot: 2nd Edition)
Show Figures

Figure 1

17 pages, 1685 KiB  
Article
Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning
by Qiaoyu Xu, Tianle Zhang, Kunpeng Zhou, Yansong Lin and Wenhao Ju
Appl. Sci. 2024, 14(11), 4936; https://doi.org/10.3390/app14114936 - 6 Jun 2024
Cited by 3 | Viewed by 2313
Abstract
To address the local minimum issue commonly encountered in active collision avoidance using artificial potential field (APF), this paper presents a novel algorithm that integrates APF with deep reinforcement learning (DRL) for robotic arms. Firstly, to improve the training efficiency of DRL for [...] Read more.
To address the local minimum issue commonly encountered in active collision avoidance using artificial potential field (APF), this paper presents a novel algorithm that integrates APF with deep reinforcement learning (DRL) for robotic arms. Firstly, to improve the training efficiency of DRL for the collision avoidance problem, Hindsight Experience Replay (HER) was enhanced by adjusting the positions of obstacles, resulting in Hindsight Experience Replay for Collision Avoidance (HER-CA). Subsequently, A robotic arm collision avoidance action network model was trained based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) and HER-CA methods. Further, a full-body collision avoidance potential field model of the robotic arm was established based on the artificial potential field. Lastly, the trained action network model was used to guide APF in real-time collision avoidance planning. Comparative experiments between HER and HER-CA were conducted. The model trained with HER-CA improves the average success rate of the collision avoidance task by about 10% compared to the model trained with HER. And a collision avoidance simulation was conducted on the rock drilling robotic arm, confirming the effectiveness of the guided APF method. Full article
Show Figures

Figure 1

16 pages, 4477 KiB  
Article
Autonomous Driving of Mobile Robots in Dynamic Environments Based on Deep Deterministic Policy Gradient: Reward Shaping and Hindsight Experience Replay
by Minjae Park, Chaneun Park and Nam Kyu Kwon
Biomimetics 2024, 9(1), 51; https://doi.org/10.3390/biomimetics9010051 - 13 Jan 2024
Cited by 3 | Viewed by 2898
Abstract
In this paper, we propose a reinforcement learning-based end-to-end learning method for the autonomous driving of a mobile robot in a dynamic environment with obstacles. Applying two additional techniques for reinforcement learning simultaneously helps the mobile robot in finding an optimal policy to [...] Read more.
In this paper, we propose a reinforcement learning-based end-to-end learning method for the autonomous driving of a mobile robot in a dynamic environment with obstacles. Applying two additional techniques for reinforcement learning simultaneously helps the mobile robot in finding an optimal policy to reach the destination without collisions. First, the multifunctional reward-shaping technique guides the agent toward the goal by utilizing information about the destination and obstacles. Next, employing the hindsight experience replay technique to address the experience imbalance caused by the sparse reward problem assists the agent in finding the optimal policy. We validated the proposed technique in both simulation and real-world environments. To assess the effectiveness of the proposed method, we compared experiments for five different cases. Full article
(This article belongs to the Special Issue Artificial Intelligence for Autonomous Robots 2024)
Show Figures

Figure 1

17 pages, 16552 KiB  
Article
Event-Triggered Hierarchical Planner for Autonomous Navigation in Unknown Environment
by Changhao Chen, Bifeng Song, Qiang Fu, Dong Xue and Lei He
Drones 2023, 7(12), 690; https://doi.org/10.3390/drones7120690 - 27 Nov 2023
Cited by 2 | Viewed by 2747
Abstract
End-to-end deep neural network (DNN)-based motion planners have shown great potential in high-speed autonomous UAV flight. Yet, most existing methods only employ a single high-capacity DNN, which typically lacks generalization ability and suffers from high sample complexity. We propose a novel event-triggered hierarchical [...] Read more.
End-to-end deep neural network (DNN)-based motion planners have shown great potential in high-speed autonomous UAV flight. Yet, most existing methods only employ a single high-capacity DNN, which typically lacks generalization ability and suffers from high sample complexity. We propose a novel event-triggered hierarchical planner (ETHP), which exploits the bi-level optimization nature of the navigation task to achieve both efficient training and improved optimality. Specifically, we learn a depth-image-based end-to-end motion planner in a hierarchical reinforcement learning framework, where the high-level DNN is a reactive collision avoidance rerouter triggered by the clearance distance, and the low-level DNN is a goal-chaser that generates the heading and velocity references in real time. Our training considers the field-of-view constraint and explores the bi-level structural flexibility to promote the spatio–temporal optimality of planning. Moreover, we design simple yet effective rules to collect hindsight experience replay buffers, yielding more high-quality samples and faster convergence. The experiments show that, compared with a single-DNN baseline planner, ETHP significantly improves the success rate and generalizes better to the unseen environment. Full article
Show Figures

Figure 1

26 pages, 2535 KiB  
Article
Multiple UAS Traffic Planning Based on Deep Q-Network with Hindsight Experience Replay and Economic Considerations
by Shao Xuan Seah and Sutthiphong Srigrarom
Aerospace 2023, 10(12), 980; https://doi.org/10.3390/aerospace10120980 - 22 Nov 2023
Cited by 1 | Viewed by 1720
Abstract
This paper explores the use of deep reinforcement learning in solving the multi-agent aircraft traffic planning (individual paths) and collision avoidance problem for a multiple UAS, such as that for a cargo drone network. Specifically, the Deep Q-Network (DQN) with Hindsight Experience Replay [...] Read more.
This paper explores the use of deep reinforcement learning in solving the multi-agent aircraft traffic planning (individual paths) and collision avoidance problem for a multiple UAS, such as that for a cargo drone network. Specifically, the Deep Q-Network (DQN) with Hindsight Experience Replay framework is adopted and trained on a three-dimensional state space that represents a congested urban environment with dynamic obstacles. Through formalising a Markov decision process (MDP), various flight and control parameters are varied between training simulations to study their effects on agent performance. Both fully observable MDPs (FOMDPs) and partially observable MDPs (POMDPs) are formulated to understand the role of shaping reward signals on training performance. While conventional traffic planning and optimisation techniques are evaluated based on path length or time, this paper aims to incorporate economic analysis by considering tangible and intangible sources of cost, such as the cost of energy, the value of time (VOT) and the value of reliability (VOR). By comparing outcomes from an integration of multiple cost sources, this paper is better able to gauge the impact of various parameters on efficiency. To further explore the feasibility of multiple UAS traffic planning, such as cargo drone networks, the trained agents are also subjected to multi-agent point-to-point and hub-and-spoke network environments. In these simulations, delivery orders are generated using a discrete event simulator with an arrival rate, which is varied to investigate the effect of travel demand on economic costs. Simulation results point to the importance of signal engineering, as reward signals play a crucial role in shaping reinforcements. The results also reflect an increase in costs for environments where congestion and arrival time uncertainty arise because of the presence of other agents in the network. Full article
(This article belongs to the Collection Air Transportation—Operations and Management)
Show Figures

Figure 1

15 pages, 2636 KiB  
Article
A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots
by Tinglong Zhao, Ming Wang, Qianchuan Zhao, Xuehan Zheng and He Gao
Biomimetics 2023, 8(6), 481; https://doi.org/10.3390/biomimetics8060481 - 10 Oct 2023
Cited by 12 | Viewed by 4045
Abstract
The path planning problem has gained more attention due to the gradual popularization of mobile robots. The utilization of reinforcement learning techniques facilitates the ability of mobile robots to successfully navigate through an environment containing obstacles and effectively plan their path. This is [...] Read more.
The path planning problem has gained more attention due to the gradual popularization of mobile robots. The utilization of reinforcement learning techniques facilitates the ability of mobile robots to successfully navigate through an environment containing obstacles and effectively plan their path. This is achieved by the robots’ interaction with the environment, even in situations when the environment is unfamiliar. Consequently, we provide a refined deep reinforcement learning algorithm that builds upon the soft actor-critic (SAC) algorithm, incorporating the concept of maximum entropy for the purpose of path planning. The objective of this strategy is to mitigate the constraints inherent in conventional reinforcement learning, enhance the efficacy of the learning process, and accommodate intricate situations. In the context of reinforcement learning, two significant issues arise: inadequate incentives and inefficient sample use during the training phase. To address these challenges, the hindsight experience replay (HER) mechanism has been presented as a potential solution. The HER mechanism aims to enhance algorithm performance by effectively reusing past experiences. Through the utilization of simulation studies, it can be demonstrated that the enhanced algorithm exhibits superior performance in comparison with the pre-existing method. Full article
(This article belongs to the Special Issue Biomimicry for Optimization, Control, and Automation)
Show Figures

Figure 1

32 pages, 8120 KiB  
Article
End-to-End AUV Local Motion Planning Method Based on Deep Reinforcement Learning
by Xi Lyu, Yushan Sun, Lifeng Wang, Jiehui Tan and Liwen Zhang
J. Mar. Sci. Eng. 2023, 11(9), 1796; https://doi.org/10.3390/jmse11091796 - 14 Sep 2023
Cited by 7 | Viewed by 2369
Abstract
This study aims to solve the problems of sparse reward, single policy, and poor environmental adaptability in the local motion planning task of autonomous underwater vehicles (AUVs). We propose a two-layer deep deterministic policy gradient algorithm-based end-to-end perception–planning–execution method to overcome the challenges [...] Read more.
This study aims to solve the problems of sparse reward, single policy, and poor environmental adaptability in the local motion planning task of autonomous underwater vehicles (AUVs). We propose a two-layer deep deterministic policy gradient algorithm-based end-to-end perception–planning–execution method to overcome the challenges associated with training and learning in end-to-end approaches that directly output control forces. In this approach, the state set is established based on the environment information, the action set is established based on the motion characteristics of the AUV, and the control execution force set is established based on the control constraints. The mapping relations between each set are trained using deep reinforcement learning, enabling the AUV to perform the corresponding action in the current state, thereby accomplishing tasks in an end-to-end manner. Furthermore, we introduce the hindsight experience replay (HER) method in the perception planning mapping process to enhance stability and sample efficiency during training. Finally, we conduct simulation experiments encompassing planning, execution, and end-to-end performance evaluation. Simulation training demonstrates that our proposed method exhibits improved decision-making capabilities and real-time obstacle avoidance during planning. Compared to global planning, the end-to-end algorithm comprehensively considers constraints in the AUV planning process, resulting in more realistic AUV actions that are gentler and more stable, leading to controlled tracking errors. Full article
(This article belongs to the Special Issue AI for Navigation and Path Planning of Marine Vehicles)
Show Figures

Figure 1

19 pages, 8072 KiB  
Article
Stable and Efficient Reinforcement Learning Method for Avoidance Driving of Unmanned Vehicles
by Sun-Ho Jang, Woo-Jin Ahn, Yu-Jin Kim, Hyung-Gil Hong, Dong-Sung Pae and Myo-Taeg Lim
Electronics 2023, 12(18), 3773; https://doi.org/10.3390/electronics12183773 - 6 Sep 2023
Cited by 1 | Viewed by 1718
Abstract
Reinforcement learning (RL) has demonstrated considerable potential in solving challenges across various domains, notably in autonomous driving. Nevertheless, implementing RL in autonomous driving comes with its own set of difficulties, such as the overestimation phenomenon, extensive learning time, and sparse reward problems. Although [...] Read more.
Reinforcement learning (RL) has demonstrated considerable potential in solving challenges across various domains, notably in autonomous driving. Nevertheless, implementing RL in autonomous driving comes with its own set of difficulties, such as the overestimation phenomenon, extensive learning time, and sparse reward problems. Although solutions like hindsight experience replay (HER) have been proposed to alleviate these issues, the direct utilization of RL in autonomous vehicles remains constrained due to the intricate fusion of information and the possibility of system failures during the learning process. In this paper, we present a novel RL-based autonomous driving system technology that combines obstacle-dependent Gaussian (ODG) RL, soft actor-critic (SAC), and meta-learning algorithms. Our approach addresses key issues in RL, including the overestimation phenomenon and sparse reward problems, by incorporating prior knowledge derived from the ODG algorithm. With these solutions in place, the ultimate aim of this work is to improve the performance of reinforcement learning and develop a swift, stable, and robust learning method for implementing autonomous driving systems that can effectively adapt to various environments and overcome the constraints of direct RL utilization in autonomous vehicles. We evaluated our proposed algorithm on official F1 circuits, using high-fidelity racing simulations with complex dynamics. The results demonstrate exceptional performance, with our method achieving up to 89% faster learning speed compared to existing algorithms in these environments. Full article
Show Figures

Figure 1

15 pages, 3028 KiB  
Article
Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles
by Jiabao Yu, Jiawei Chen, Ying Chen, Zhiguo Zhou and Junwei Duan
J. Mar. Sci. Eng. 2022, 10(12), 2026; https://doi.org/10.3390/jmse10122026 - 18 Dec 2022
Cited by 1 | Viewed by 2757
Abstract
Although broad reinforcement learning (BRL) provides a more intelligent autonomous decision-making method for the collision avoidance problem of unmanned surface vehicles (USVs), the algorithm still has the problem of over-estimation and has difficulty converging quickly due to the sparse reward problem in a [...] Read more.
Although broad reinforcement learning (BRL) provides a more intelligent autonomous decision-making method for the collision avoidance problem of unmanned surface vehicles (USVs), the algorithm still has the problem of over-estimation and has difficulty converging quickly due to the sparse reward problem in a large area of sea. To overcome the dilemma, we propose a double broad reinforcement learning based on hindsight experience replay (DBRL-HER) for the collision avoidance system of USVs to improve the efficiency and accuracy of decision-making. The algorithm decouples the two steps of target action selection and target Q value calculation to form the double broad reinforcement learning method and then adopts hindsight experience replay to allow the agent to learn from the experience of failure in order to greatly improve the sample utilization efficiency. Through training in a grid environment, the collision avoidance success rate of the proposed algorithm was found to be 31.9 percentage points higher than that in the deep Q network (DQN) and 24.4 percentage points higher than that in BRL. A Unity 3D simulation platform with high fidelity was also designed to simulate the movement of USVs. An experiment on the platform fully verified the effectiveness of the proposed algorithm. Full article
(This article belongs to the Special Issue Ship Collision Risk Assessment)
Show Figures

Figure 1

15 pages, 1329 KiB  
Article
Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning
by Sofanit Wubeshet Beyene and Ji-Hyeong Han
Electronics 2022, 11(24), 4192; https://doi.org/10.3390/electronics11244192 - 15 Dec 2022
Cited by 2 | Viewed by 2692
Abstract
Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to be applied directly to resolve [...] Read more.
Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to be applied directly to resolve multi-task manipulation problems. This is mostly due to the problems associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the soft actor-critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy from two structurally similar tasks and adapts the policy to a target task. We propose a prioritized hindsight with dual experience replay to improve the data storage and sampling technique, which, in turn, assists the agent in performing structured exploration that leads to sample efficiency. The proposed method separates the experience replay buffer into two buffers to contain real trajectories and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer. Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily adapting to the new task. We demonstrate the proposed method based on several manipulation tasks using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method outperforms vanilla SAC in both a single-task setting and multi-task setting. Full article
(This article belongs to the Special Issue Advanced Machine Learning for Intelligent Robotics)
Show Figures

Figure 1

Back to TopTop