MDPI - Publisher of Open Access Journals

23 pages, 42651 KiB

Open AccessArticle

Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*

by Qiaoyu Xu and Yansong Lin

Sensors 2025, 25(9), 2654; https://doi.org/10.3390/s25092654 - 22 Apr 2025

Cited by 1 | Viewed by 670

Abstract

In addressing the optimal motion planning issue for multi-arm rock drilling robots, this paper introduces a high-precision motion planning method based on Multi-Strategy Sampling RRT* (MSS-RRT*). A dual Jacobi iterative inverse solution method, coupled with a forward kinematics error compensation model, is introduced to dynamically correct target positions, improving end-effector positioning accuracy. A multi-strategy sampling mechanism is constructed by integrating DRL position sphere sampling, spatial random sampling, and goal-oriented sampling. This mechanism flexibly applies three sampling methods at different stages of path planning, significantly improving the adaptability and search efficiency of the RRT* algorithm. In particular, DRL position sphere sampling is prioritized during the initial phase, effectively reducing the number of invalid sampling points. For training a three-arm DRL model with the twin delayed deep deterministic policy gradient algorithm (TD3), the Hindsight Experience Replay-Obstacle Arm Transfer (HER-OAT) method is used for data replay. The cylindrical bounding box method effectively prevents collisions between arms. The experimental results show that the proposed method improves motion planning accuracy by 94.15% compared to a single Jacobi iteration. MSS-RRT* can plan a superior path in a shorter duration, with the planning time under optimal path conditions being only 20.71% of that required by Informed-RRT*, and with the path length reduced by 21.58% compared to Quick-RRT* under the same time constraints. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

19 pages, 10296 KiB

Open AccessArticle

Extended Maximum Actor–Critic Framework Based on Policy Gradient Reinforcement for System Optimization

by Jung-Hyun Kim, Yong-Hoon Choi, You-Rak Choi, Jae-Hyeok Jeong and Min-Suk Kim

Appl. Sci. 2025, 15(4), 1828; https://doi.org/10.3390/app15041828 - 11 Feb 2025

Viewed by 832

Abstract

Recently, significant research efforts have been directed toward leveraging Artificial Intelligence for sensor data processing and system control. In particular, it is essential to determine the optimal path and trajectory by calculating sensor data for effective control systems. For instance, model-predictive control based on Proportional-Integral-Derivative models is intuitive, efficient, and provides outstanding control performance. However, challenges in tracking persist, which requires active research and development to integrate and optimize the control system in terms of Machine Learning. Specifically, Reinforcement Learning, a branch of Machine Learning, has been used in several research fields to solve optimal control problems. In this paper, we propose an Extended Maximum Actor–Critic using a Reinforcement Learning-based method to combine the advantages of both value and policy to enhance the learning stability of actor–critic for optimization of system control. The proposed method integrates the actor and the maximized actor in the learning process to evaluate and identify actions with the highest value, facilitating effective learning exploration. Additionally, to enhance the efficiency and robustness of the agent learning process, we propose Prioritized Hindsight Experience Replay, combining the advantages of Prioritized Experience Replay and Hindsight Experience Replay. To verify this, we performed evaluations and experiments to examine the improved training stability in the MuJoCo environment, which is a simulator based on Reinforcement Learning. The proposed Prioritized Hindsight Experience Replay method significantly enhances the experience to be compared with the standard replay buffer and PER in experimental simulators, such as the simple HalfCheetah-v4 and the complex Ant-v4. Thus, Prioritized Hindsight Experience Replay achieves a higher success rate than PER in FetchReach-v2, demonstrating the significant effectiveness of our proposed method in more complex reward environments. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

► Show Figures

Figure 1

15 pages, 1329 KiB

Open AccessArticle

Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning

by Sofanit Wubeshet Beyene and Ji-Hyeong Han

Electronics 2022, 11(24), 4192; https://doi.org/10.3390/electronics11244192 - 15 Dec 2022

Cited by 2 | Viewed by 2688

Abstract

Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to be applied directly to resolve multi-task manipulation problems. This is mostly due to the problems associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the soft actor-critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy from two structurally similar tasks and adapts the policy to a target task. We propose a prioritized hindsight with dual experience replay to improve the data storage and sampling technique, which, in turn, assists the agent in performing structured exploration that leads to sample efficiency. The proposed method separates the experience replay buffer into two buffers to contain real trajectories and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer. Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily adapting to the new task. We demonstrate the proposed method based on several manipulation tasks using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method outperforms vanilla SAC in both a single-task setting and multi-task setting. Full article

(This article belongs to the Special Issue Advanced Machine Learning for Intelligent Robotics)

► Show Figures

Figure 1

15 pages, 5564 KiB

Open AccessArticle

Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning

by Chayoung Kim and JiSu Park

Symmetry 2019, 11(11), 1352; https://doi.org/10.3390/sym11111352 - 1 Nov 2019

Cited by 2 | Viewed by 4991

Abstract

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments. Full article

(This article belongs to the Special Issue Symmetry-Adapted Machine Learning for Information Security)

► Show Figures

Figure 1

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI