MDPI - Publisher of Open Access Journals

19 pages, 6307 KB

Open AccessArticle

Robust Guidance Policies Through Deep Reinforcement Learning

by Seongyeon Kim, Jongho Shin and Hyeong-Geun Kim

Aerospace 2026, 13(3), 233; https://doi.org/10.3390/aerospace13030233 - 2 Mar 2026

Viewed by 539

Unmanned aerial vehicle (UAV) guidance systems must operate reliably under significant uncertainties, such as sensor noise, target maneuvers, and environmental disturbances. Traditional guidance methods like proportional navigation (PN), while computationally efficient, often struggle to maintain performance under such challenging conditions. To overcome these [...] Read more.

Unmanned aerial vehicle (UAV) guidance systems must operate reliably under significant uncertainties, such as sensor noise, target maneuvers, and environmental disturbances. Traditional guidance methods like proportional navigation (PN), while computationally efficient, often struggle to maintain performance under such challenging conditions. To overcome these limitations, this study proposes a robust UAV guidance framework based on deep reinforcement learning (DRL), specifically utilizing the soft actor–critic (SAC) algorithm. The UAV–target tracking problem is formulated as the Markov decision process (MDP) for both two-dimensional (2D) and three-dimensional (3D) scenarios. A deep neural network policy is trained in noisy environments to generate acceleration commands that minimize the zero-effort miss (ZEM). Extensive numerical simulations conducted using the OpenAI Gym validate effectiveness of the proposed method under previously unseen initial conditions and increased noise levels. The results demonstrate that the SAC-based policy achieves higher tracking success rates than the PN, particularly under strict terminal conditions and observation noise. Full article

► Show Figures

Figure 1

36 pages, 3276 KB

Open AccessArticle

Robot Planning via LLM Proposals and Symbolic Verification

by Drejc Pesjak and Jure Žabkar

Mach. Learn. Knowl. Extr. 2026, 8(1), 22; https://doi.org/10.3390/make8010022 - 16 Jan 2026

Viewed by 2531

Abstract

Planning in robotics represents an ongoing research challenge, as it requires the integration of sensing, reasoning, and execution. Although large language models (LLMs) provide a high degree of flexibility in planning, they often introduce hallucinated goals and actions and consequently lack the formal [...] Read more.

Planning in robotics represents an ongoing research challenge, as it requires the integration of sensing, reasoning, and execution. Although large language models (LLMs) provide a high degree of flexibility in planning, they often introduce hallucinated goals and actions and consequently lack the formal reliability of deterministic methods. In this paper, we address this limitation by proposing a hybrid Sense–Plan–Code–Act (SPCA) framework that combines perception, LLM-based reasoning, and symbolic planning. Within the proposed approach, sensory information is first transformed into a symbolic description of the world in Planning Domain Definition Language (PDDL) using an LLM. A heuristic planner is then used to generate a valid plan, which is subsequently converted to code by a second LLM. The generated code is first validated syntactically through compilation and then semantically in simulation. When errors are detected, local corrections can be applied and the process is repeated as necessary. The proposed method is evaluated in the OpenAI Gym MiniGrid reinforcement learning environment and in a Gazebo simulation on a UR5 robotic arm using a curriculum of tasks with increasing complexity. The system successfully completes approximately 71–75% of tasks across environments with a relatively low number of simulation iterations. Full article

(This article belongs to the Special Issue Using Large Language Models for Scientific Problem Solving and Engineering Design)

► Show Figures

Figure 1

16 pages, 1725 KB

Open AccessArticle

A Reinforcement Learning-Based Link State Optimization for Handover and Link Duration Performance Enhancement in Low Earth Orbit Satellite Networks

by Sihwa Jin, Doyeon Park, Sieun Kim, Jinho Lee and Inwhee Joe

Electronics 2026, 15(2), 398; https://doi.org/10.3390/electronics15020398 - 16 Jan 2026

Viewed by 643

Abstract

This study proposes a reinforcement learning-based link selection method for Low Earth Orbit satellite networks, aiming to reduce handover frequency while extending link duration under highly dynamic orbital environments. The proposed approach relies solely on basic satellite positional information, namely latitude, longitude, and [...] Read more.

This study proposes a reinforcement learning-based link selection method for Low Earth Orbit satellite networks, aiming to reduce handover frequency while extending link duration under highly dynamic orbital environments. The proposed approach relies solely on basic satellite positional information, namely latitude, longitude, and altitude, to construct compact state representations without requiring complex sensing or prediction mechanisms. Using relative satellite and terminal geometry, each state is represented as a vector consisting of azimuth, elevation, range, and direction difference. To validate the feasibility of policy learning under realistic conditions, a total of 871,105 orbit based data samples were generated through simulations of 300 LEO satellite orbits. The reinforcement learning environment was implemented using the OpenAI Gym framework, in which an agent selects an optimal communication target from a prefiltered set of candidate satellites at each time step. Three reinforcement learning algorithms, namely SARSA, Q-Learning, and Deep Q-Network, were evaluated under identical experimental conditions. Performance was assessed in terms of smoothed total reward per episode, average handover count, and average link duration. The results show that the Deep Q-Network-based approach achieves approximately 77.4% fewer handovers than SARSA and 49.9% fewer than Q-Learning, while providing the longest average link duration. These findings demonstrate that effective handover control can be achieved using lightweight state information and indicate the potential of deep reinforcement learning for future LEO satellite communication systems. Full article

(This article belongs to the Special Issue Cloud Computing Systems and Intelligent Applications: Advances in Networks and Semantics)

► Show Figures

Figure 1

18 pages, 812 KB

Open AccessArticle

Deep Reinforcement Learning for Adaptive Robotic Grasping and Post-Grasp Manipulation in Simulated Dynamic Environments

by Henrique C. Ferreira and Ramiro S. Barbosa

Future Internet 2025, 17(10), 437; https://doi.org/10.3390/fi17100437 - 26 Sep 2025

Cited by 2 | Viewed by 3808

Abstract

This article presents a deep reinforcement learning (DRL) approach for adaptive robotic grasping in dynamic environments. We developed UR5GraspingEnv, a PyBullet-based simulation environment integrated with OpenAI Gym, to train a UR5 robotic arm with a Robotiq 2F-85 gripper. Soft Actor-Critic (SAC) and Proximal [...] Read more.

This article presents a deep reinforcement learning (DRL) approach for adaptive robotic grasping in dynamic environments. We developed UR5GraspingEnv, a PyBullet-based simulation environment integrated with OpenAI Gym, to train a UR5 robotic arm with a Robotiq 2F-85 gripper. Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) were implemented to learn robust grasping policies for randomly positioned objects. A tailored reward function, combining distance penalties, grasp, and pose rewards, optimizes grasping and post-grasping tasks, enhanced by domain randomization. SAC achieves an 87% grasp success rate and 75% post-grasp success, outperforming PPO 82% and 68%, with stable convergence over 100,000 timesteps. The system addresses post-grasping manipulation and sim-to-real transfer challenges, advancing industrial and assistive applications. Results demonstrate the feasibility of learning stable and goal-driven policies for single-arm robotic manipulation using minimal supervision. Both PPO and SAC yield competitive performance, with SAC exhibiting superior adaptability in cluttered or edge cases. These findings suggest that DRL, when carefully designed and monitored, can support scalable learning in manipulation tasks. Full article

(This article belongs to the Special Issue Artificial Intelligence and Control Systems for Industry 4.0 and 5.0)

► Show Figures

Figure 1

23 pages, 3829 KB

Open AccessArticle

Causal Correction and Compensation Network for Robotics: Applications and Validation in Continuous Control

by Xiaoqing Zhu, Lanyue Bi, Tong Wu, Chuan Zhang and Jiahao Wu

Appl. Sci. 2025, 15(17), 9628; https://doi.org/10.3390/app15179628 - 1 Sep 2025

Cited by 1 | Viewed by 1198

Abstract

Deep Reinforcement Learning (DRL) has achieved remarkable success in robotic control, autonomous driving, and game-playing agents. However, its decision-making process often remains a black box, lacking both interpretability and verifiability. In robotic control tasks, developers cannot pinpoint decision errors or precisely adjust control [...] Read more.

Deep Reinforcement Learning (DRL) has achieved remarkable success in robotic control, autonomous driving, and game-playing agents. However, its decision-making process often remains a black box, lacking both interpretability and verifiability. In robotic control tasks, developers cannot pinpoint decision errors or precisely adjust control strategies based solely on observed robot behaviors. To address this challenge, this work proposes an interpretable DRL framework based on a Causal Correction and Compensation Network (C²-Net), which systematically captures the causal relationships underlying decision-making and enhances policy robustness. C²-Net integrates a Graph Neural Network-based Neural Causal Model (GNN-NCM) to compute causal influence weights for each action. These weights are then dynamically applied to correct and compensate the raw policy outputs, thereby balancing performance optimization and transparency. This work validates the approach on OpenAI Gym’s Hopper, Walker2d, and Humanoid environments, as well as the multi-agent AzureLoong platform built on Isaac Gym. In terms of convergence speed, final return, and policy robustness, experimental results show that C²-Net achieves higher performance over both non-causal baselines and conventional attention-based models. Moreover, it provides rich causal explanations for its decisions. The framework represents a principled shift from correlation to causation and offers a practical solution for the safe and reliable deployment of multi-robot systems. Full article

► Show Figures

Figure 1

60 pages, 633 KB

Open AccessArticle

Secure and Trustworthy Open Radio Access Network (O-RAN) Optimization: A Zero-Trust and Federated Learning Framework for 6G Networks

by Mohammed El-Hajj

Future Internet 2025, 17(6), 233; https://doi.org/10.3390/fi17060233 - 25 May 2025

Cited by 14 | Viewed by 6165

Abstract

The Open Radio Access Network (O-RAN) paradigm promises unprecedented flexibility and cost efficiency for 6G networks but introduces critical security risks due to its disaggregated, AI-driven architecture. This paper proposes a secure optimization framework integrating zero-trust principles and privacy-preserving Federated Learning (FL) to [...] Read more.

The Open Radio Access Network (O-RAN) paradigm promises unprecedented flexibility and cost efficiency for 6G networks but introduces critical security risks due to its disaggregated, AI-driven architecture. This paper proposes a secure optimization framework integrating zero-trust principles and privacy-preserving Federated Learning (FL) to address vulnerabilities in O-RAN’s RAN Intelligent Controllers (RICs) and xApps/rApps. We first establish a novel threat model targeting O-RAN’s optimization processes, highlighting risks such as adversarial Machine Learning (ML) attacks on resource allocation models and compromised third-party applications. To mitigate these, we design a Zero-Trust Architecture (ZTA) enforcing continuous authentication and micro-segmentation for RIC components, coupled with an FL framework that enables collaborative ML training across operators without exposing raw network data. A differential privacy mechanism is applied to global model updates to prevent inference attacks. We validate our framework using the DAWN Dataset (5G/6G traffic traces with slicing configurations) and the OpenRAN Gym Dataset (O-RAN-compliant resource utilization metrics) to simulate energy efficiency optimization under adversarial conditions. A dynamic DU sleep scheduling case study demonstrates 32% energy savings with <5% latency degradation, even when data poisoning attacks compromise 15% of the FL participants. Comparative analysis shows that our ZTA reduces unauthorized RIC access attempts by 89% compared to conventional O-RAN security baselines. This work bridges the gap between performance optimization and trustworthiness in next-generation O-RAN, offering actionable insights for 6G standardization. Full article

(This article belongs to the Special Issue Secure and Trustworthy Next Generation O-RAN Optimisation)

► Show Figures

Figure 1

23 pages, 4463 KB

Open AccessArticle

Dual-Priority Delayed Deep Double Q-Network (DPD3QN): A Dueling Double Deep Q-Network with Dual-Priority Experience Replay for Autonomous Driving Behavior Decision-Making

by Shuai Li, Peicheng Shi, Aixi Yang, Heng Qi and Xinlong Dong

Algorithms 2025, 18(5), 291; https://doi.org/10.3390/a18050291 - 19 May 2025

Cited by 6 | Viewed by 1728

Abstract

The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this [...] Read more.

The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this paper proposes a dueling double deep Q-network based on dual-priority experience replay—DPD3QN. Initially, the dueling network is integrated with the double deep Q-network, and the original network’s output layer is restructured to enhance the precision of action value estimation. Subsequently, dual-priority experience replay is incorporated to facilitate the model’s ability to swiftly recognize and leverage critical experiences. Ultimately, the training and evaluation are conducted on the OpenAI Gym simulation platform. The test results show that DPD3QN helps to improve the convergence speed of driverless vehicle behavior decision-making. Compared with the currently popular DQN and DDQN algorithms, this algorithm achieves higher success rates in challenging scenarios. Test scenario I increases by 11.8 and 25.8 percentage points, respectively, while the success rates in test scenarios I and II rise by 8.8 and 22.2 percentage points, respectively, indicating a more secure and efficient autonomous driving decision-making capability. Full article

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

► Show Figures

Figure 1

28 pages, 6260 KB

Open AccessFeature PaperArticle

Development of Chiller Plant Models in OpenAI Gym Environment for Evaluating Reinforcement Learning Algorithms

by Xiangrui Wang, Qilin Zhang, Zhihua Chen, Jingjing Yang and Yixing Chen

Energies 2025, 18(9), 2225; https://doi.org/10.3390/en18092225 - 27 Apr 2025

Cited by 4 | Viewed by 2810

Abstract

To face the global energy crisis, the requirement of energy transition and sustainable development has emphasized the importance of controlling building energy management systems. Reinforcement learning (RL) has shown notable energy-saving potential in the optimal control of heating, ventilation, and air-conditioning (HVAC) systems. [...] Read more.

To face the global energy crisis, the requirement of energy transition and sustainable development has emphasized the importance of controlling building energy management systems. Reinforcement learning (RL) has shown notable energy-saving potential in the optimal control of heating, ventilation, and air-conditioning (HVAC) systems. However, the coupling of the algorithms and environments limits the cross-scenario application. This paper develops chiller plant models in OpenAI Gym environments to evaluate different RL algorithms for optimizing condenser water loop control. A shopping mall in Changsha, China, was selected as the case study building. First, an energy simulation model in EnergyPlus was generated using AutoBPS. Then, the OpenAI Gym chiller plant system model was developed and validated by comparing it with the EnergyPlus simulation results. Moreover, two RL algorithms, Deep-Q-Network (DQN) and Double Deep-Q-Network (DDQN), were deployed to control the condenser water flow rate and approach temperature of cooling towers in the RL environment. Finally, the optimization performance of DQN across three climate zones was evaluated using the AutoBPS-Gym toolkit. The findings indicated that during the cooling season in a shopping mall in Changsha, the DQN control method resulted in energy savings of 14.16% for the cooling water system, whereas the DDQN method achieved savings of 14.01%. Using the average control values from DQN, the EnergyPlus simulation recorded an energy-saving rate of 10.42% compared to the baseline. Furthermore, implementing the DQN algorithm across three different climatic zones led to an average energy savings of 4.0%, highlighting the toolkit’s ability to effectively utilize RL for optimal control in various environmental contexts. Full article

(This article belongs to the Special Issue Energy Modeling and Efficiency Optimization for Sustainable Building Systems)

► Show Figures

Figure 1

44 pages, 8130 KB

Open AccessArticle

Classification-Based Q-Value Estimation for Continuous Actor-Critic Reinforcement Learning

by Chayoung Kim

Symmetry 2025, 17(5), 638; https://doi.org/10.3390/sym17050638 - 23 Apr 2025

Cited by 2 | Viewed by 1759

Abstract

Stable Q-value estimation is critical for effective policy learning in deep reinforcement learning (DRL), especially continuous control tasks. Traditional algorithms like Soft Actor-Critic (SAC) and Twin Delayed Deep Deterministic (TD3) policy gradients rely on Mean Squared Error (MSE) loss for Q-value approximation, which [...] Read more.

Stable Q-value estimation is critical for effective policy learning in deep reinforcement learning (DRL), especially continuous control tasks. Traditional algorithms like Soft Actor-Critic (SAC) and Twin Delayed Deep Deterministic (TD3) policy gradients rely on Mean Squared Error (MSE) loss for Q-value approximation, which may cause instability due to misestimation and overestimation biases. Although distributional reinforcement learning (RL) algorithms like C51 have improved robustness in discrete action spaces, their application to continuous control remains computationally expensive owing to distribution projection needs. To address this, we propose a classification-based Q-value learning method that reformulates Q-value estimation as a classification problem rather than a regression task. Replacing MSE loss with cross-entropy (CE) and Kullback–Leibler (KL) divergence losses, the proposed method improves learning stability and mitigates overestimation errors. Our statistical analysis across 30 independent runs shows that the approach achieves an approximately 10% lower Q-value estimation error in the pendulum environment and a 40–60% reduced training time compared to SAC and Continuous Twin Delayed Distributed Deep Deterministic (CTD4) Policy Gradient. Experimental results on OpenAI Gym benchmark environments demonstrate that our approach, with up to 77% fewer parameters, outperforms the SAC and CTD4 policy gradients regarding training stability and convergence speed, while maintaining a competitive final policy performance. Full article

(This article belongs to the Special Issue Symmetry in Intelligent Algorithms)

► Show Figures

Figure 1

22 pages, 2186 KB

Open AccessArticle

Deep Reinforcement Learning-Based Enhancement of Robotic Arm Target-Reaching Performance

by Ldet Honelign, Yoseph Abebe, Abera Tullu and Sunghun Jung

Actuators 2025, 14(4), 165; https://doi.org/10.3390/act14040165 - 26 Mar 2025

Cited by 3 | Viewed by 3192

Abstract

This work investigates the implementation of the Deep Deterministic Policy Gradient (DDPG) algorithm to enhance the target-reaching capability of the seven degree-of-freedom (7-DoF) Franka Pandarobotic arm. A simulated environment is established by employing OpenAI Gym, PyBullet, and Panda Gym. After 100,000 training time [...] Read more.

This work investigates the implementation of the Deep Deterministic Policy Gradient (DDPG) algorithm to enhance the target-reaching capability of the seven degree-of-freedom (7-DoF) Franka Pandarobotic arm. A simulated environment is established by employing OpenAI Gym, PyBullet, and Panda Gym. After 100,000 training time steps, the DDPG algorithm attains a success rate of 100% and an average reward of −1.8. The actor loss and critic loss values are 0.0846 and 0.00486, respectively, indicating improved decision-making and accurate value function estimations. The simulation results demonstrate the efficiency of DDPG in improving robotic arm performance, highlighting its potential for application to improve robotic arm manipulation. Full article

(This article belongs to the Special Issue From Theory to Practice: Incremental Nonlinear Control)

► Show Figures

Figure 1

27 pages, 1396 KB

Open AccessArticle

The Cart-Pole Application as a Benchmark for Neuromorphic Computing

by James S. Plank, Charles P. Rizzo, Chris A. White and Catherine D. Schuman

J. Low Power Electron. Appl. 2025, 15(1), 5; https://doi.org/10.3390/jlpea15010005 - 26 Jan 2025

Cited by 8 | Viewed by 3189

Abstract

The cart-pole application is a well-known control application that is often used to illustrate reinforcement learning algorithms with conventional neural networks. An implementation of the application from OpenAI Gym is ubiquitous and popular. Spiking neural networks are the basis of brain-based, or neuromorphic [...] Read more.

The cart-pole application is a well-known control application that is often used to illustrate reinforcement learning algorithms with conventional neural networks. An implementation of the application from OpenAI Gym is ubiquitous and popular. Spiking neural networks are the basis of brain-based, or neuromorphic computing. They are attractive, especially as agents for control applications, because of their very low size, weight and power requirements. We are motivated to help researchers in neuromorphic computing to be able to compare their work with common benchmarks, and in this paper we explore using the cart-pole application as a benchmark for spiking neural networks. We propose four parameter settings that scale the application in difficulty, in particular beyond the default parameter settings which do not pose a difficult test for AI agents. We propose achievement levels for AI agents that are trained with these settings. Next, we perform an experiment that employs the benchmark and its difficulty levels to evaluate the effectiveness of eight neuroprocessor settings on success with the application. Finally, we perform a detailed examination of eight example networks from this experiment, that achieve our goals on the difficulty levels, and comment on features that enable them to be successful. Our goal is to help researchers in neuromorphic computing to utilize the cart-pole application as an effective benchmark. Full article

(This article belongs to the Special Issue Advances in Low Power Neuromorphic Computing: Models, Algorithms, and Applications)

► Show Figures

Figure 1

15 pages, 5609 KB

Open AccessArticle

Rapidly Exploring Random Trees Reinforcement Learning (RRT-RL): A New Era in Training Sample Diversity

by István Péter, Bálint Kővári and Tamás Bécsi

Electronics 2025, 14(3), 443; https://doi.org/10.3390/electronics14030443 - 22 Jan 2025

Cited by 2 | Viewed by 3335

Abstract

Sample efficiency is a crucial problem in Reinforcement Learning, especially when tackling environments with sparse reward signals that make convergence and learning cumbersome. In this work, a novel method is developed that combines Rapidly Exploring Random Trees with Reinforcement Learning to mitigate the [...] Read more.

Sample efficiency is a crucial problem in Reinforcement Learning, especially when tackling environments with sparse reward signals that make convergence and learning cumbersome. In this work, a novel method is developed that combines Rapidly Exploring Random Trees with Reinforcement Learning to mitigate the inefficiency of the trial-and-error-based experience-gathering concept through the systematic exploration of the state space. The combined approach eliminates the redundancy in irrelevant training samples. Consequently, the pivotal training signals, despite their sparsity, can be further exposed to support the learning process. Experiments are made on several OpenAI gym environments to demonstrate that the proposed method does not have any context-dependent components, and the results show that it can outperform the classic trial-and-error-based training approach. Full article

(This article belongs to the Special Issue Reinforcement Learning Meets Control: Theories and Applications)

► Show Figures

Figure 1

27 pages, 9470 KB

Open AccessArticle

Multi-Objective Dynamic Path Planning with Multi-Agent Deep Reinforcement Learning

by Mengxue Tao, Qiang Li and Junxi Yu

J. Mar. Sci. Eng. 2025, 13(1), 20; https://doi.org/10.3390/jmse13010020 - 27 Dec 2024

Cited by 8 | Viewed by 5228

Abstract

Multi-agent reinforcement learning (MARL) is characterized by its simple structure and strong adaptability, which has led to its widespread application in the field of path planning. To address the challenge of optimal path planning for mobile agent clusters in uncertain environments, a multi-objective [...] Read more.

Multi-agent reinforcement learning (MARL) is characterized by its simple structure and strong adaptability, which has led to its widespread application in the field of path planning. To address the challenge of optimal path planning for mobile agent clusters in uncertain environments, a multi-objective dynamic path planning model (MODPP) based on multi-agent deep reinforcement learning (MADRL) has been proposed. This model is suitable for complex, unstable task environments characterized by dimensionality explosion and offers scalability. The approach consists of two components: an action evaluation module and an action decision module, utilizing a centralized training with decentralized execution (CTDE) training architecture. During the training process, agents within the cluster learn cooperative strategies while being able to communicate with one another. Consequently, they can navigate through task environments without communication, achieving collision-free paths that optimize multiple sub-objectives globally, minimizing time, distance, and overall costs associated with turning. Furthermore, in real-task execution, agents acting as mobile entities can perform real-time obstacle avoidance. Finally, based on the OpenAI Gym platform, environments such as simple multi-objective environment and complex multi-objective environment were designed to analyze the rationality and effectiveness of the multi-objective dynamic path planning through minimum cost and collision risk assessments. Additionally, the impact of reward function configuration on agent strategies was discussed. Full article

(This article belongs to the Special Issue Advanced Condition Monitoring and Intelligent Operation & Maintenance Technologies in Ships and Offshore Facilities)

► Show Figures

Figure 1

17 pages, 2872 KB

Open AccessArticle

Discrete Space Deep Reinforcement Learning Algorithm Based on Support Vector Machine Recursive Feature Elimination

by Chayoung Kim

Symmetry 2024, 16(8), 940; https://doi.org/10.3390/sym16080940 - 23 Jul 2024

Cited by 3 | Viewed by 2342

Abstract

Algorithms for training agents with experience replay have advanced in several domains, primarily because prioritized experience replay (PER) developed from the double deep Q-network (DDQN) in deep reinforcement learning (DRL) has become a standard. PER-based algorithms have achieved significant success in the image [...] Read more.

Algorithms for training agents with experience replay have advanced in several domains, primarily because prioritized experience replay (PER) developed from the double deep Q-network (DDQN) in deep reinforcement learning (DRL) has become a standard. PER-based algorithms have achieved significant success in the image and video domains. However, the exceptional results observed in images and videos are not as effective in many domains with simple action spaces and relatively small states, particularly in discrete action spaces with sparse rewards. Moreover, most advanced techniques may improve sampling efficiency using deep learning algorithms rather than reinforcement learning. However, there is growing evidence that deep learning algorithms cannot generalize during training. Therefore, this study proposes an algorithm suitable for discrete action space environments that uses the sample efficiency of PER based on DDQN but incorporates support vector machine recursive feature elimination (SVM-RFE) without enhancing the sampling efficiency through deep learning algorithms. The proposed algorithm exhibited considerable performance improvements in classical OpenAI Gym environments that did not use images or videos as inputs. In particular, simple discrete space environments with reflection symmetry, such as Cart–Pole, exhibited a faster and more stable learning process. These results suggest that the application of SVM-RFE, which leverages the orthogonality of support vector machines (SVMs) across learning patterns, can be appropriate when the data in the reinforcement learning environment demonstrate symmetry. Full article

(This article belongs to the Section Mathematics)

► Show Figures

Figure 1

17 pages, 7133 KB

Open AccessArticle

Deep-Reinforcement-Learning-Based Motion Planning for a Wide Range of Robotic Structures

by Roman Parák, Jakub Kůdela, Radomil Matoušek and Martin Juříček

Computation 2024, 12(6), 116; https://doi.org/10.3390/computation12060116 - 5 Jun 2024

Cited by 7 | Viewed by 8422

Abstract

The use of robot manipulators in engineering applications and scientific research has significantly increased in recent years. This can be attributed to the rise of technologies such as autonomous robotics and physics-based simulation, along with the utilization of artificial intelligence techniques. The use [...] Read more.

The use of robot manipulators in engineering applications and scientific research has significantly increased in recent years. This can be attributed to the rise of technologies such as autonomous robotics and physics-based simulation, along with the utilization of artificial intelligence techniques. The use of these technologies may be limited due to a focus on a specific type of robotic manipulator and a particular solved task, which can hinder modularity and reproducibility in future expansions. This paper presents a method for planning motion across a wide range of robotic structures using deep reinforcement learning (DRL) algorithms to solve the problem of reaching a static or random target within a pre-defined configuration space. The paper addresses the challenge of motion planning in environments under a variety of conditions, including environments with and without the presence of collision objects. It highlights the versatility and potential for future expansion through the integration of OpenAI Gym and the PyBullet physics-based simulator. Full article

(This article belongs to the Special Issue 10th Anniversary of Computation—Computational Engineering)

► Show Figures

Figure 1

Search Results (45)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (45)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI