Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = hindsight and prioritized experience

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 42651 KiB  
Article
Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*
by Qiaoyu Xu and Yansong Lin
Sensors 2025, 25(9), 2654; https://doi.org/10.3390/s25092654 - 22 Apr 2025
Cited by 1 | Viewed by 670
Abstract
In addressing the optimal motion planning issue for multi-arm rock drilling robots, this paper introduces a high-precision motion planning method based on Multi-Strategy Sampling RRT* (MSS-RRT*). A dual Jacobi iterative inverse solution method, coupled with a forward kinematics error compensation model, is introduced [...] Read more.
In addressing the optimal motion planning issue for multi-arm rock drilling robots, this paper introduces a high-precision motion planning method based on Multi-Strategy Sampling RRT* (MSS-RRT*). A dual Jacobi iterative inverse solution method, coupled with a forward kinematics error compensation model, is introduced to dynamically correct target positions, improving end-effector positioning accuracy. A multi-strategy sampling mechanism is constructed by integrating DRL position sphere sampling, spatial random sampling, and goal-oriented sampling. This mechanism flexibly applies three sampling methods at different stages of path planning, significantly improving the adaptability and search efficiency of the RRT* algorithm. In particular, DRL position sphere sampling is prioritized during the initial phase, effectively reducing the number of invalid sampling points. For training a three-arm DRL model with the twin delayed deep deterministic policy gradient algorithm (TD3), the Hindsight Experience Replay-Obstacle Arm Transfer (HER-OAT) method is used for data replay. The cylindrical bounding box method effectively prevents collisions between arms. The experimental results show that the proposed method improves motion planning accuracy by 94.15% compared to a single Jacobi iteration. MSS-RRT* can plan a superior path in a shorter duration, with the planning time under optimal path conditions being only 20.71% of that required by Informed-RRT*, and with the path length reduced by 21.58% compared to Quick-RRT* under the same time constraints. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

19 pages, 10296 KiB  
Article
Extended Maximum Actor–Critic Framework Based on Policy Gradient Reinforcement for System Optimization
by Jung-Hyun Kim, Yong-Hoon Choi, You-Rak Choi, Jae-Hyeok Jeong and Min-Suk Kim
Appl. Sci. 2025, 15(4), 1828; https://doi.org/10.3390/app15041828 - 11 Feb 2025
Viewed by 832
Abstract
Recently, significant research efforts have been directed toward leveraging Artificial Intelligence for sensor data processing and system control. In particular, it is essential to determine the optimal path and trajectory by calculating sensor data for effective control systems. For instance, model-predictive control based [...] Read more.
Recently, significant research efforts have been directed toward leveraging Artificial Intelligence for sensor data processing and system control. In particular, it is essential to determine the optimal path and trajectory by calculating sensor data for effective control systems. For instance, model-predictive control based on Proportional-Integral-Derivative models is intuitive, efficient, and provides outstanding control performance. However, challenges in tracking persist, which requires active research and development to integrate and optimize the control system in terms of Machine Learning. Specifically, Reinforcement Learning, a branch of Machine Learning, has been used in several research fields to solve optimal control problems. In this paper, we propose an Extended Maximum Actor–Critic using a Reinforcement Learning-based method to combine the advantages of both value and policy to enhance the learning stability of actor–critic for optimization of system control. The proposed method integrates the actor and the maximized actor in the learning process to evaluate and identify actions with the highest value, facilitating effective learning exploration. Additionally, to enhance the efficiency and robustness of the agent learning process, we propose Prioritized Hindsight Experience Replay, combining the advantages of Prioritized Experience Replay and Hindsight Experience Replay. To verify this, we performed evaluations and experiments to examine the improved training stability in the MuJoCo environment, which is a simulator based on Reinforcement Learning. The proposed Prioritized Hindsight Experience Replay method significantly enhances the experience to be compared with the standard replay buffer and PER in experimental simulators, such as the simple HalfCheetah-v4 and the complex Ant-v4. Thus, Prioritized Hindsight Experience Replay achieves a higher success rate than PER in FetchReach-v2, demonstrating the significant effectiveness of our proposed method in more complex reward environments. Full article
Show Figures

Figure 1

15 pages, 1329 KiB  
Article
Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning
by Sofanit Wubeshet Beyene and Ji-Hyeong Han
Electronics 2022, 11(24), 4192; https://doi.org/10.3390/electronics11244192 - 15 Dec 2022
Cited by 2 | Viewed by 2688
Abstract
Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to be applied directly to resolve [...] Read more.
Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to be applied directly to resolve multi-task manipulation problems. This is mostly due to the problems associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the soft actor-critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy from two structurally similar tasks and adapts the policy to a target task. We propose a prioritized hindsight with dual experience replay to improve the data storage and sampling technique, which, in turn, assists the agent in performing structured exploration that leads to sample efficiency. The proposed method separates the experience replay buffer into two buffers to contain real trajectories and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer. Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily adapting to the new task. We demonstrate the proposed method based on several manipulation tasks using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method outperforms vanilla SAC in both a single-task setting and multi-task setting. Full article
(This article belongs to the Special Issue Advanced Machine Learning for Intelligent Robotics)
Show Figures

Figure 1

15 pages, 5564 KiB  
Article
Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning
by Chayoung Kim and JiSu Park
Symmetry 2019, 11(11), 1352; https://doi.org/10.3390/sym11111352 - 1 Nov 2019
Cited by 2 | Viewed by 4991
Abstract
In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks [...] Read more.
In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments. Full article
(This article belongs to the Special Issue Symmetry-Adapted Machine Learning for Information Security)
Show Figures

Figure 1

Back to TopTop