MDPI - Publisher of Open Access Journals

20 pages, 10034 KB

Open AccessArticle

A Two-Wheel-Centric Reconfigurable Mobility Platform Enabled by Compact Steering–Drive–Suspension Modules: Balance, Driving, and Cooperative Transport

by Junghyun Choi

Machines 2026, 14(6), 704; https://doi.org/10.3390/machines14060704 (registering DOI) - 19 Jun 2026

Abstract

Modern logistics and manufacturing environments simultaneously demand mobility platforms that are compact enough to navigate narrow aisles and powerful enough to transport oversized or heavy components. We previously developed a compact Steering–Drive–Suspension (SDS) module that integrates steering, in-wheel drive, and suspension within a [...] Read more.

Modern logistics and manufacturing environments simultaneously demand mobility platforms that are compact enough to navigate narrow aisles and powerful enough to transport oversized or heavy components. We previously developed a compact Steering–Drive–Suspension (SDS) module that integrates steering, in-wheel drive, and suspension within a single wheel envelope, achieving

\pm 90^{\circ}

wide-angle steering with a single actuator. The present paper extends that hardware-centric work by treating the two-wheel (2WD) configuration assembled from two SDS modules as the unit module of the platform, building a four-wheel (4WD) operation by coupling two such 2WD units, and developing a unified balance and impedance-based control scheme. We derive a cart–pole inverted-pendulum model for the 2WD configuration and a planar 2-DOF bicycle model for the coupled and cooperative configurations, with full controllability proof and quantitative LQR robustness margins. Three Python 3.12 based scenarios validate the framework: (i) a 2WD inverted-pendulum tracking task, (ii) a forward and lateral relocation maneuver compared across SDS Crab, Ackermann, and four-wheel-steering modes, and (iii) cooperative transport of a

100 kg

steel plate by two impedance-coupled 2WD units. Across all scenarios the proposed controllers achieve sub-centimetre tracking gap, pitch deviation within

\pm 2^{\circ}

, and well-damped cooperative behavior without payload sloshing. The results substantiate the central design claim that the SDS module’s compactness enables a single hardware platform to act simultaneously as an autonomous small-payload mover, a building block of a 4WD platform, and a cooperative agent for oversized loads. Full article

(This article belongs to the Special Issue Advances in Automotive Mechatronics)

► Show Figures

Figure 1

20 pages, 3452 KB

Open AccessArticle

Effectiveness of Experience-Sharing Group Learning in Deep Reinforcement Learning

by Keita Muroya, Makoto Ikeda and Akira Notsu

Appl. Sci. 2026, 16(7), 3250; https://doi.org/10.3390/app16073250 - 27 Mar 2026

Viewed by 487

Abstract

Deep reinforcement learning faces a critical trade-off between computational cost and performance. This study proposes an experience-sharing group-learning framework in which multiple agents with different network sizes collaboratively learn a single task through a shared experience replay memory. Unlike conventional multi-agent approaches that [...] Read more.

Deep reinforcement learning faces a critical trade-off between computational cost and performance. This study proposes an experience-sharing group-learning framework in which multiple agents with different network sizes collaboratively learn a single task through a shared experience replay memory. Unlike conventional multi-agent approaches that assume homogeneous agents, our method enables agents with different computational capabilities to share experiences, allowing low-performance agents to benefit from high-performance agents’ quality experiences. The proposed method was evaluated in CartPole and Super Mario Bros environments. In CartPole two-agent experiments, the low-performance agent (Agent16, 404 parameters) achieved approximately 2× performance improvement (93.3 to 184.4 steps) through group learning, while the high-performance agent (Agent64, 4676 parameters) maintained comparable performance, though several group conditions fell below the solo 200-step result. Three-agent experiments further improved Agent16 to 196.5 steps with reduced variance. Under step-matched comparisons in Super Mario Bros, the low-capacity agent benefits from experience sharing beyond solo baselines that consume roughly twice as many steps, while the high-capacity agent remains broadly comparable between group and solo. Claims are limited to step-based normalisation. Q-value analysis revealed accelerated early learning, with Q-values increasing by +10.1 (Mario) and +7.7 (Luigi) at 1 million steps. These results demonstrate that experience-sharing group learning can improve learning efficiency for resource-constrained agents under a fixed environment-step budget. Full article

(This article belongs to the Special Issue Advances in Intelligent Systems—2nd edition)

► Show Figures

Figure 1

25 pages, 2297 KB

Open AccessArticle

A Multi-Agent Advisory Board Reinforcement Learning Framework for Adaptive Cooperative Control

by Onur Osman, Tolga Kudret Karaca, Bahar Yalcin Kavus, Gokalp Tulum and Sajjad Nematzadeh

Algorithms 2026, 19(3), 230; https://doi.org/10.3390/a19030230 - 18 Mar 2026

Viewed by 573

Abstract

This study proposes Advisory Board Reinforcement Learning (AdvB-RL), a cooperative reinforcement-learning framework that integrates multiple advisory neural networks to guide policy optimization. Unlike conventional single-agent architectures, AdvB-RL maintains a set of independently trained advisory networks that contribute to action selection through a dynamic [...] Read more.

This study proposes Advisory Board Reinforcement Learning (AdvB-RL), a cooperative reinforcement-learning framework that integrates multiple advisory neural networks to guide policy optimization. Unlike conventional single-agent architectures, AdvB-RL maintains a set of independently trained advisory networks that contribute to action selection through a dynamic aggregation mechanism. This design preserves diverse experiential knowledge while improving learning stability and the exploration–exploitation balance. The framework is evaluated on three benchmark control tasks, namely LunarLander-v2, CartPole-v1, and MountainCar-v0, using advisory board sizes of 1, 5, and 10 members against a Double Deep Q-Network (DDQN) baseline. The best-performing configuration, 10 AdvB, achieved 270.02 ± 24.74 on LunarLander-v2 versus 227.92 ± 86.02 for DDQN, 497.79 ± 5.18 on CartPole-v1 versus 304.37 ± 144.04, and −103.16 ± 15.46 on MountainCar-v0 versus −130.71 ± 31.64, indicating higher returns together with markedly lower variability. Across the three environments, these results show that increasing the number of advisory members improves both reward consistency and overall robustness, with the 10-member setting providing the strongest performance. Within the tested configurations, the advisory board mechanism remains computationally feasible, while preliminary experiments beyond 10 advisors show diminishing returns relative to added complexity. Overall, AdvB-RL provides a robust and modular alternative to single-policy reinforcement learning for adaptive cooperative control. Full article

► Show Figures

Figure 1

19 pages, 6049 KB

Open AccessArticle

Optimized Design of a Permanent Magnet Machine for Golf Carts Under Multiple Operating Conditions

by Wenye Wu, Donghui Li and Weifeng Wang

World Electr. Veh. J. 2025, 16(12), 680; https://doi.org/10.3390/wevj16120680 - 18 Dec 2025

Cited by 1 | Viewed by 665

Abstract

In response to the growing demand for efficient and eco-friendly golf carts, this paper presents an optimized design of a permanent magnet synchronous machine (PMSM) for multiple operating conditions. The application scenarios of the golf cart were first analyzed, identifying the power requirements [...] Read more.

In response to the growing demand for efficient and eco-friendly golf carts, this paper presents an optimized design of a permanent magnet synchronous machine (PMSM) for multiple operating conditions. The application scenarios of the golf cart were first analyzed, identifying the power requirements under three driving conditions such as unloaded on flat roads, fully loaded on flat roads, and fully loaded on slopes. Then, a 36-slot 8-pole interior PMSM is developed, and a systematic two-stage optimization strategy using a Multi-Objective Genetic Algorithm (MOGA) is applied to enhance both no-load and rated-load performance. By adjusting key rotor parameters to balance competing objectives, the optimized machine demonstrates notable improvements in cogging torque reduction, output torque, torque ripple minimization, and operational efficiency. Specifically, the results show that the optimized machine achieves a cogging torque reduction of over 60%, an increase in maximum output torque by 7.3%, and a peak efficiency improvement of 1.2 percentage points under high-load conditions. Experimental results validate the effectiveness of the design and confirm its suitability for the complex operating conditions of golf carts. Full article

(This article belongs to the Section Propulsion Systems and Components)

► Show Figures

Figure 1

26 pages, 4507 KB

Open AccessArticle

A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning

by Hadi Mohammadian KhalafAnsar, Jaime Rohten and Jafar Keighobadi

AI 2025, 6(12), 319; https://doi.org/10.3390/ai6120319 - 6 Dec 2025

Viewed by 1260

Abstract

Objectives: This paper presents an innovative control framework for the classical Cart–Pole problem. Methods: The proposed framework combines Interval Type-2 Fuzzy Logic, the Dueling Double DQN deep reinforcement learning algorithm, and adaptive reward shaping techniques. Specifically, fuzzy logic acts as an a priori [...] Read more.

Objectives: This paper presents an innovative control framework for the classical Cart–Pole problem. Methods: The proposed framework combines Interval Type-2 Fuzzy Logic, the Dueling Double DQN deep reinforcement learning algorithm, and adaptive reward shaping techniques. Specifically, fuzzy logic acts as an a priori knowledge layer that incorporates measurement uncertainty in both angle and angular velocity, allowing the controller to generate adaptive actions dynamically. Simultaneously, the deep Q-network is responsible for learning the optimal policy. To ensure stability, the Double DQN mechanism successfully alleviates the overestimation bias commonly observed in value-based reinforcement learning. An accelerated convergence mechanism is achieved through a multi-component reward shaping function that prioritizes angle stability and survival. Results: Given the training results, the method stabilizes rapidly; it achieves a 100% success rate by episode 20 and maintains consistent high rewards (650–700) throughout training. While Standard DQN and other baselines take 100+ episodes to become reliable, our method converges in about 20 episodes (4–5 times faster). It is observed that in comparison with advanced baselines like C51 or PER, the proposed method is about 15–20% better in final performance. We also found that PPO and QR-DQN surprisingly struggle on this task, highlighting the need for stability mechanisms. Conclusions: The proposed approach provides a practical solution that balances exploration with safety through the integration of fuzzy logic and deep reinforcement learning. This rapid convergence is particularly important for real-world applications where data collection is expensive, achieving stable performance much faster than existing methods without requiring complex theoretical guarantees. Full article

► Show Figures

Figure 1

29 pages, 7081 KB

Open AccessArticle

Q-Learning for Online PID Controller Tuning in Continuous Dynamic Systems: An Interpretable Framework for Exploring Multi-Agent Systems

by Davor Ibarra-Pérez, Sergio García-Nieto and Javier Sanchis Saez

Mathematics 2025, 13(21), 3461; https://doi.org/10.3390/math13213461 - 30 Oct 2025

Cited by 1 | Viewed by 1897

Abstract

This study proposes a discrete multi-agent Q-learning framework for the online tuning of PID controllers in continuous dynamic systems with limited observability. The approach treats the adjustment of each PID gain (

k_{p}

,

k_{i}

,

k_{d}

) as an [...] Read more.

This study proposes a discrete multi-agent Q-learning framework for the online tuning of PID controllers in continuous dynamic systems with limited observability. The approach treats the adjustment of each PID gain (

k_{p}

,

k_{i}

,

k_{d}

) as an independent learning process, in which each agent operates within a discrete state space corresponding to its own gain and selects actions from a tripartite space (decrease, maintain, or increase its gain). The agents act simultaneously under fixed decision intervals, favoring their convergence by preserving quasi-stationary conditions of the perceived environment, while a shared cumulative global reward, composed of system parameters, time and control action penalties, and stability incentives, guides coordinated exploration toward control objectives. Implemented in Python, the framework was validated in two nonlinear control problems: a water-tank and inverted pendulum (cart-pole) systems. The agents achieved their initial convergence after approximately 300 and 500 episodes, respectively, with overall success rates of

49.6 %

and

46.2 %

in 5000 training episodes. The learning process exhibited sustained convergence toward effective PID configurations capable of stabilizing both systems without explicit dynamic models. These findings confirm the feasibility of the proposed low-complexity discrete reinforcement learning approach for online adaptive PID tuning, achieving interpretable and reproducible control policies and providing a new basis for future hybrid schemes that unite classical control theory and reinforcement learning agents. Full article

(This article belongs to the Special Issue AI, Machine Learning and Optimization)

► Show Figures

Figure 1

17 pages, 1163 KB

Open AccessArticle

Decoupled Reinforcement Hybrid PPO–Sliding Control for Underactuated Systems: Application to Cart–Pole and Acrobot

by Yi-Jen Mon

Machines 2025, 13(7), 601; https://doi.org/10.3390/machines13070601 - 11 Jul 2025

Viewed by 1474

Abstract

Underactuated systems, such as the Cart–Pole and Acrobot, pose significant control challenges due to their inherent nonlinearity and limited actuation. Traditional control methods often struggle to achieve stable and optimal performance in these complex scenarios. This paper presents a novel stable reinforcement learning [...] Read more.

Underactuated systems, such as the Cart–Pole and Acrobot, pose significant control challenges due to their inherent nonlinearity and limited actuation. Traditional control methods often struggle to achieve stable and optimal performance in these complex scenarios. This paper presents a novel stable reinforcement learning (RL) approach for underactuated systems, integrating advanced exploration–exploitation mechanisms and a refined policy optimization framework to address instability issues in RL-based control. The proposed method is validated through extensive experiments on two benchmark underactuated systems: the Cart–Pole and Acrobot. In the Cart–Pole task, the method achieves long-term balance with high stability, outperforming traditional RL algorithms such as the Proximal Policy Optimization (PPO) in average episode length and robustness to environmental disturbances. For the Acrobot, the approach enables reliable swing-up and near-vertical stabilization but cannot achieve sustained balance control beyond short time intervals due to residual dynamics and control limitations. A key contribution is the development of a hybrid PPO–sliding mode control strategy that enhances learning efficiency and stabilities for underactuated systems. Full article

(This article belongs to the Special Issue Robotic Intelligence Development of AI in Robot Perception, Learning, and Decision)

► Show Figures

Figure 1

11 pages, 1425 KB

Open AccessFeature PaperArticle

Invariant-Based Inverse Engineering for Balanced Displacement of a Cartpole System

by Ion Lizuain, Ander Tobalina and Alvaro Rodriguez-Prieto

Mathematics 2025, 13(8), 1220; https://doi.org/10.3390/math13081220 - 8 Apr 2025

Cited by 1 | Viewed by 991

Abstract

Adiabaticity is a key concept in physics, but its applications in mechanical and control engineering remain underexplored. Adiabatic invariants ensure robust dynamics under slow changes, but they impose impractical time limitations. Shortcuts to Adiabaticity (STA) overcome these limitations by enabling fast operations with [...] Read more.

Adiabaticity is a key concept in physics, but its applications in mechanical and control engineering remain underexplored. Adiabatic invariants ensure robust dynamics under slow changes, but they impose impractical time limitations. Shortcuts to Adiabaticity (STA) overcome these limitations by enabling fast operations with minimal final excitations. In this work, we set a STA strategy based on dynamical invariants and inverse engineering to design the trajectory of a cartpole, a system characterized by its instability and repulsive potential. The trajectories found guarantee a balanced transport of the cartpole within the small oscillations regime. The results are compared to numerical simulations with the exact non-linear model to set the working domain of the designed protocol. Full article

(This article belongs to the Special Issue Mathematical Modeling and Simulation of Oscillatory Phenomena, 2nd Edition)

► Show Figures

Figure 1

32 pages, 1250 KB

Open AccessArticle

Exploration-Driven Genetic Algorithms for Hyperparameter Optimisation in Deep Reinforcement Learning

by Bartłomiej Brzęk, Barbara Probierz and Jan Kozak

Appl. Sci. 2025, 15(4), 2067; https://doi.org/10.3390/app15042067 - 16 Feb 2025

Cited by 7 | Viewed by 4056

Abstract

This paper investigates the application of genetic algorithms (GAs) for hyperparameter optimisation in deep reinforcement learning (RL), focusing on the Deep Q-Learning (DQN) algorithm. This study aims to identify approaches that enhance RL model performance through the effective exploration of the configuration space. [...] Read more.

This paper investigates the application of genetic algorithms (GAs) for hyperparameter optimisation in deep reinforcement learning (RL), focusing on the Deep Q-Learning (DQN) algorithm. This study aims to identify approaches that enhance RL model performance through the effective exploration of the configuration space. By comparing different GA methods for selection, crossover, and mutation, this study focuses on deep RL models. The results indicate that GA techniques emphasising the exploration of the configuration space yield significant improvements in optimisation efficiency, reducing training time and enhancing convergence. The most effective GA improved the fitness function value from 68.26 (initial best chromosome) to 979.16 after 200 iterations, demonstrating the efficacy of the proposed approach. Furthermore, variations in specific hyperparameters, such as learning rate, gamma, and update frequency, were shown to substantially affect the DQN model’s learning ability. These findings suggest that exploration-driven GA strategies outperform GA approaches with limited exploration, underscoring the critical role of selection and crossover methods in enhancing DQN model efficiency and performance. Moreover, a mini case study on the CartPole environment revealed that even a 5% sensor dropout impaired the performance of a GA-optimised RL agent, while a 20% dropout almost entirely halted improvements. Full article

(This article belongs to the Special Issue Recent Advances in Automated Machine Learning: 2nd Edition)

► Show Figures

Figure 1

27 pages, 1396 KB

Open AccessArticle

The Cart-Pole Application as a Benchmark for Neuromorphic Computing

by James S. Plank, Charles P. Rizzo, Chris A. White and Catherine D. Schuman

J. Low Power Electron. Appl. 2025, 15(1), 5; https://doi.org/10.3390/jlpea15010005 - 26 Jan 2025

Cited by 8 | Viewed by 3411

Abstract

The cart-pole application is a well-known control application that is often used to illustrate reinforcement learning algorithms with conventional neural networks. An implementation of the application from OpenAI Gym is ubiquitous and popular. Spiking neural networks are the basis of brain-based, or neuromorphic [...] Read more.

The cart-pole application is a well-known control application that is often used to illustrate reinforcement learning algorithms with conventional neural networks. An implementation of the application from OpenAI Gym is ubiquitous and popular. Spiking neural networks are the basis of brain-based, or neuromorphic computing. They are attractive, especially as agents for control applications, because of their very low size, weight and power requirements. We are motivated to help researchers in neuromorphic computing to be able to compare their work with common benchmarks, and in this paper we explore using the cart-pole application as a benchmark for spiking neural networks. We propose four parameter settings that scale the application in difficulty, in particular beyond the default parameter settings which do not pose a difficult test for AI agents. We propose achievement levels for AI agents that are trained with these settings. Next, we perform an experiment that employs the benchmark and its difficulty levels to evaluate the effectiveness of eight neuroprocessor settings on success with the application. Finally, we perform a detailed examination of eight example networks from this experiment, that achieve our goals on the difficulty levels, and comment on features that enable them to be successful. Our goal is to help researchers in neuromorphic computing to utilize the cart-pole application as an effective benchmark. Full article

(This article belongs to the Special Issue Advances in Low Power Neuromorphic Computing: Models, Algorithms, and Applications)

► Show Figures

Figure 1

17 pages, 2872 KB

Open AccessArticle

Discrete Space Deep Reinforcement Learning Algorithm Based on Support Vector Machine Recursive Feature Elimination

by Chayoung Kim

Symmetry 2024, 16(8), 940; https://doi.org/10.3390/sym16080940 - 23 Jul 2024

Cited by 3 | Viewed by 2392

Abstract

Algorithms for training agents with experience replay have advanced in several domains, primarily because prioritized experience replay (PER) developed from the double deep Q-network (DDQN) in deep reinforcement learning (DRL) has become a standard. PER-based algorithms have achieved significant success in the image [...] Read more.

Algorithms for training agents with experience replay have advanced in several domains, primarily because prioritized experience replay (PER) developed from the double deep Q-network (DDQN) in deep reinforcement learning (DRL) has become a standard. PER-based algorithms have achieved significant success in the image and video domains. However, the exceptional results observed in images and videos are not as effective in many domains with simple action spaces and relatively small states, particularly in discrete action spaces with sparse rewards. Moreover, most advanced techniques may improve sampling efficiency using deep learning algorithms rather than reinforcement learning. However, there is growing evidence that deep learning algorithms cannot generalize during training. Therefore, this study proposes an algorithm suitable for discrete action space environments that uses the sample efficiency of PER based on DDQN but incorporates support vector machine recursive feature elimination (SVM-RFE) without enhancing the sampling efficiency through deep learning algorithms. The proposed algorithm exhibited considerable performance improvements in classical OpenAI Gym environments that did not use images or videos as inputs. In particular, simple discrete space environments with reflection symmetry, such as Cart–Pole, exhibited a faster and more stable learning process. These results suggest that the application of SVM-RFE, which leverages the orthogonality of support vector machines (SVMs) across learning patterns, can be appropriate when the data in the reinforcement learning environment demonstrate symmetry. Full article

(This article belongs to the Section Mathematics)

► Show Figures

Figure 1

17 pages, 766 KB

Open AccessArticle

Robust and Exponential Stabilization of a Cart–Pendulum System via Geometric PID Control

by Zhifei Zhang, Miaoxu Fang, Minrui Fei and Jinrong Li

Symmetry 2024, 16(1), 94; https://doi.org/10.3390/sym16010094 - 11 Jan 2024

Cited by 2 | Viewed by 3141

Abstract

This paper addresses the robust stabilization problem of a cart–pole system. The controlled dynamics of this interconnected system are deduced by following the analytic framework of Lagrangian mechanics, and the residual terms are formulated as a bias depending on the angle and angular [...] Read more.

This paper addresses the robust stabilization problem of a cart–pole system. The controlled dynamics of this interconnected system are deduced by following the analytic framework of Lagrangian mechanics, and the residual terms are formulated as a bias depending on the angle and angular velocity. A geometric definition of Proportional–Integral–Derivative (PID) control algorithm is proposed, and a Lyapunov function is explicitly constructed through two stages of variable change. Local exponential stability of the stable equilibrium is proved, and a criterion for parameter tuning is provided by ensuring an exponential decrease in the Lyapunov function. Enlarging the control parameters to infinity allows for the extension of attraction region almost to the half circle. The effectiveness of geometric PID controller and the local exponential stability of the resulting close system are verified by simulating a numerical example. Full article

► Show Figures

Figure 1

9 pages, 468 KB

Open AccessArticle

Optimal Shortcuts to Adiabatic Control by Lagrange Mechanics

by Lanlan Ma and Qian Kong

Entropy 2023, 25(5), 719; https://doi.org/10.3390/e25050719 - 26 Apr 2023

Cited by 4 | Viewed by 2191

Abstract

We combined an inverse engineering technique based on Lagrange mechanics and optimal control theory to design an optimal trajectory that can transport a cartpole in a fast and stable way. For classical control, we used the relative displacement between the ball and the [...] Read more.

We combined an inverse engineering technique based on Lagrange mechanics and optimal control theory to design an optimal trajectory that can transport a cartpole in a fast and stable way. For classical control, we used the relative displacement between the ball and the trolley as the controller to study the anharmonic effect of the cartpole. Under this constraint, we used the time minimization principle in optimal control theory to find the optimal trajectory, and the solution of time minimization is the bang-bang form, which ensures that the pendulum is in a vertical upward position at the initial and the final moments and oscillates in a small angle range. Full article

(This article belongs to the Special Issue Quantum Control and Quantum Computing)

► Show Figures

Figure 1

17 pages, 4023 KB

Open AccessArticle

Signal Novelty Detection as an Intrinsic Reward for Robotics

by Martin Kubovčík, Iveta Dirgová Luptáková and Jiří Pospíchal

Sensors 2023, 23(8), 3985; https://doi.org/10.3390/s23083985 - 14 Apr 2023

Cited by 5 | Viewed by 3579

Abstract

In advanced robot control, reinforcement learning is a common technique used to transform sensor data into signals for actuators, based on feedback from the robot’s environment. However, the feedback or reward is typically sparse, as it is provided mainly after the task’s completion [...] Read more.

In advanced robot control, reinforcement learning is a common technique used to transform sensor data into signals for actuators, based on feedback from the robot’s environment. However, the feedback or reward is typically sparse, as it is provided mainly after the task’s completion or failure, leading to slow convergence. Additional intrinsic rewards based on the state visitation frequency can provide more feedback. In this study, an Autoencoder deep learning neural network was utilized as novelty detection for intrinsic rewards to guide the search process through a state space. The neural network processed signals from various types of sensors simultaneously. It was tested on simulated robotic agents in a benchmark set of classic control OpenAI Gym test environments (including Mountain Car, Acrobot, CartPole, and LunarLander), achieving more efficient and accurate robot control in three of the four tasks (with only slight degradation in the Lunar Lander task) when purely intrinsic rewards were used compared to standard extrinsic rewards. By incorporating autoencoder-based intrinsic rewards, robots could potentially become more dependable in autonomous operations like space or underwater exploration or during natural disaster response. This is because the system could better adapt to changing environments or unexpected situations. Full article

(This article belongs to the Special Issue Intelligent Sensing System and Robotics)

► Show Figures

Graphical abstract

15 pages, 1696 KB

Open AccessArticle

Towards a Broad-Persistent Advising Approach for Deep Interactive Reinforcement Learning in Robotic Environments

by Hung Son Nguyen, Francisco Cruz and Richard Dazeley

Sensors 2023, 23(5), 2681; https://doi.org/10.3390/s23052681 - 1 Mar 2023

Cited by 1 | Viewed by 3251

Abstract

Deep Reinforcement Learning (DeepRL) methods have been widely used in robotics to learn about the environment and acquire behaviours autonomously. Deep Interactive Reinforcement 2 Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choose actions to [...] Read more.

Deep Reinforcement Learning (DeepRL) methods have been widely used in robotics to learn about the environment and acquire behaviours autonomously. Deep Interactive Reinforcement 2 Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choose actions to speed up the learning process. However, current research has been limited to interactions that offer actionable advice to only the current state of the agent. Additionally, the information is discarded by the agent after a single use, which causes a duplicate process at the same state for a revisit. In this paper, we present Broad-Persistent Advising (BPA), an approach that retains and reuses the processed information. It not only helps trainers give more general advice relevant to similar states instead of only the current state, but also allows the agent to speed up the learning process. We tested the proposed approach in two continuous robotic scenarios, namely a cart pole balancing task and a simulated robot navigation task. The results demonstrated that the agent’s learning speed increased, as evidenced by the rising reward points of up to 37%, while maintaining the number of interactions required for the trainer, in comparison to the DeepIRL approach. Full article

(This article belongs to the Special Issue Advances in Intelligent Robotics Systems Based Machine Learning)

► Show Figures

Figure 1

Search Results (26)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (26)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI