MDPI - Publisher of Open Access Journals

24 pages, 8857 KB

Open AccessArticle

Cooperative Control and Energy Management for Autonomous Hybrid Electric Vehicles Using Machine Learning

by Jewaliddin Shaik, Sri Phani Krishna Karri, Anugula Rajamallaiah, Kishore Bingi and Ramani Kannan

Machines 2026, 14(1), 73; https://doi.org/10.3390/machines14010073 - 7 Jan 2026

Viewed by 138

The growing deployment of connected and autonomous vehicles (CAVs) requires coordinated control strategies that jointly address safety, mobility, and energy efficiency. This paper presents a novel two-stage cooperative control framework for autonomous hybrid electric vehicle (HEV) platoons based on machine learning. In the [...] Read more.

The growing deployment of connected and autonomous vehicles (CAVs) requires coordinated control strategies that jointly address safety, mobility, and energy efficiency. This paper presents a novel two-stage cooperative control framework for autonomous hybrid electric vehicle (HEV) platoons based on machine learning. In the first stage, a metric learning-based distributed model predictive control (ML-DMPC) strategy is proposed to enable cooperative longitudinal control among heterogeneous vehicles, explicitly incorporating inter-vehicle interactions to improve speed tracking, ride comfort, and platoon-level energy efficiency. In the second stage, a multi-agent twin-delayed deep deterministic policy gradient (MATD3) algorithm is developed for real-time energy management, achieving an optimal power split between the engine and battery while reducing Q-value overestimation and accelerating learning convergence. Simulation results across multiple standard driving cycles demonstrate that the proposed framework outperforms conventional distributed model predictive control (DMPC) and multi-agent deep deterministic policy gradient (MADDPG)-based methods in fuel economy, stability, and convergence speed, while maintaining battery state of charge (SOC) within safe limits. To facilitate future experimental validation, a dSPACE-based hardware-in-the-loop (HIL) architecture is designed to enable real-time deployment and testing of the proposed control framework. Full article

(This article belongs to the Special Issue Advanced Battery Management Technology in Electric Vehicles: Present Status and Future Trends)

► Show Figures

Figure 1

30 pages, 15035 KB

Open AccessArticle

Adaptive Non-Singular Fast Terminal Sliding Mode Trajectory Tracking Control for Robotic Manipulator with Novel Configuration Based on TD3 Deep Reinforcement Learning and Nonlinear Disturbance Observer

by Huaqiang You, Yanjun Liu, Zhenjie Shi, Zekai Wang, Lin Wang and Gang Xue

Sensors 2026, 26(1), 297; https://doi.org/10.3390/s26010297 - 2 Jan 2026

Viewed by 334

Abstract

This work proposes a non-singular fast terminal sliding mode control (NFTSMC) strategy based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and a nonlinear disturbance observer (NDO) to address the issues of modeling errors, motion disturbances, and transmission friction in robotic [...] Read more.

This work proposes a non-singular fast terminal sliding mode control (NFTSMC) strategy based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and a nonlinear disturbance observer (NDO) to address the issues of modeling errors, motion disturbances, and transmission friction in robotic manipulators. Firstly, a novel modular serial 5-DOF robotic manipulator configuration is designed, and its kinematic and dynamic models are established. Secondly, a nonlinear disturbance observer is employed to estimate the total disturbance of the system and apply feedforward compensation. Based on boundary layer technology, an improved NFTSMC method is proposed to accelerate the convergence of tracking errors, reduce chattering, and avoid singularity issues inherent in traditional terminal sliding mode control. The stability of the designed control system is proved using Lyapunov stability theory. Subsequently, a deep reinforcement learning (DRL) agent based on the TD3 algorithm is trained to adaptively adjust the control gains of the non-singular fast terminal sliding mode controller. The dynamic information of the robotic manipulator is used as the input to the TD3 agent, which searches for optimal controller parameters within a continuous action space. A composite reward function is designed to ensure the stable and efficient learning of the TD3 agent. Finally, the motion characteristics of three joints for the designed 5-DOF robotic manipulator are analyzed. The results show that compared to the non-singular fast terminal sliding mode control algorithm based on a nonlinear disturbance observer (NDONFT), the non-singular fast terminal sliding mode control algorithm integrating a nonlinear disturbance observer and the Twin Delayed Deep Deterministic Policy Gradient algorithm (TD3NDONFT) reduces the mean absolute error of position tracking for the three joints by 7.14%, 19.94%, and 6.14%, respectively, and reduces the mean absolute error of velocity tracking by 1.78%, 9.10%, and 2.11%, respectively. These results verify the effectiveness of the proposed algorithm in enhancing the trajectory tracking accuracy of the robotic manipulator under unknown time-varying disturbances and demonstrate its strong robustness against sudden disturbances. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

18 pages, 3635 KB

Open AccessArticle

Multi-Agent Reinforcement Learning for Sustainable Integration of Heterogeneous Resources in a Double-Sided Auction Market with Power Balance Incentive Mechanism

by Jian Huang, Ming Yang, Li Wang, Mingxing Mei, Jianfang Ye, Kejia Liu and Yaolong Bo

Sustainability 2026, 18(1), 141; https://doi.org/10.3390/su18010141 - 22 Dec 2025

Viewed by 356

Abstract

Traditional electricity market bidding typically focuses on unilateral structures, where independent energy storage units and flexible loads act merely as price takers. This reduces bidding motivation and weakens the balancing capability of regional power systems, thereby limiting the large-scale utilization of renewable energy. [...] Read more.

Traditional electricity market bidding typically focuses on unilateral structures, where independent energy storage units and flexible loads act merely as price takers. This reduces bidding motivation and weakens the balancing capability of regional power systems, thereby limiting the large-scale utilization of renewable energy. To address these challenges and support sustainable power system operation, this paper proposes a double-sided auction market strategy for heterogeneous multi-resource (HMR) participation based on multi-agent reinforcement learning (MARL). The framework explicitly considers the heterogeneous bidding and quantity reporting behaviors of renewable generation, flexible demand, and energy storage. An improved incentive mechanism is introduced to enhance real-time system power balance, thereby enabling higher renewable energy integration and reducing curtailment. To efficiently solve the market-clearing problem, an improved Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3) algorithm is employed, along with a temporal-difference (TD) error-based prioritized experience replay mechanism to strengthen exploration. Case studies validate the effectiveness of the proposed approach in guiding heterogeneous resources toward cooperative bidding behaviors, improving market efficiency, and reinforcing the sustainable and resilient operation of future power systems. Full article

(This article belongs to the Special Issue AI-Driven Low-Carbon Sustainable Energy Systems: System Design, Computational Strategies, and Emerging Innovations)

► Show Figures

Figure 1

22 pages, 3980 KB

Open AccessArticle

Deep Reinforcement Learning (DRL)-Driven Intelligent Scheduling of Virtual Power Plants

by Jiren Zhou, Kang Zheng and Yuqin Sun

Energies 2025, 18(23), 6341; https://doi.org/10.3390/en18236341 - 3 Dec 2025

Viewed by 531

Abstract

Driven by the global energy transition and carbon-neutrality goals, virtual power plants (VPPs) are expected to aggregate distributed energy resources and participate in multiple electricity markets while achieving economic efficiency and low carbon emissions. However, the strong volatility of wind and photovoltaic generation, [...] Read more.

Driven by the global energy transition and carbon-neutrality goals, virtual power plants (VPPs) are expected to aggregate distributed energy resources and participate in multiple electricity markets while achieving economic efficiency and low carbon emissions. However, the strong volatility of wind and photovoltaic generation, together with the coupling between electric and thermal loads, makes real-time VPP scheduling challenging. Existing deep reinforcement learning (DRL)-based methods still suffer from limited predictive awareness and insufficient handling of physical and carbon-related constraints. To address these issues, this paper proposes an improved model, termed SAC-LAx, based on the Soft Actor–Critic (SAC) deep reinforcement learning algorithm for intelligent VPP scheduling. The model integrates an Attention–xLSTM prediction module and a Linear Programming (LP) constraint module: the former performs multi-step forecasting of loads and renewable generation to construct an extended state representation, while the latter projects raw DRL actions onto a feasible set that satisfies device operating limits, energy balance, and carbon trading constraints. These two modules work together with the SAC algorithm to form a closed perception–prediction–decision–control loop. A campus integrated-energy virtual power plant is adopted as the case study. The system consists of a gas–steam combined-cycle power plant (CCPP), battery storage, a heat pump, a thermal storage unit, wind turbines, photovoltaic arrays, and a carbon trading mechanism. Comparative simulation results show that, at the forecasting level, the Attention–xLSTM (Ax) module reduces the day-ahead electric load Mean Absolute Percentage Error (

M_{A P E}

) from 4.51% and 5.77% obtained by classical Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models to 2.88%, significantly improving prediction accuracy. At the scheduling level, the SAC-LAx model achieves an average reward of approximately 1440 and converges within around 2500 training episodes, outperforming other DRL algorithms such as Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Proximal Policy Optimization (PPO). Under the SAC-LAx framework, the daily net operating cost of the VPP is markedly reduced. With the carbon trading mechanism, the total carbon emission cost decreases by about 49% compared with the no-trading scenario, while electric–thermal power balance is maintained. These results indicate that integrating prediction enhancement and LP-based safety constraints with deep reinforcement learning provides a feasible pathway for low-carbon intelligent scheduling of VPPs. Full article

(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

► Show Figures

Figure 1

34 pages, 23756 KB

Open AccessArticle

Fuzzy-Partitioned Multi-Agent TD3 for Photovoltaic Maximum Power Point Tracking Under Partial Shading

by Diana Ortiz-Muñoz, David Luviano-Cruz, Luis Asunción Pérez-Domínguez, Alma Guadalupe Rodríguez-Ramírez and Francesco García-Luna

Appl. Sci. 2025, 15(23), 12776; https://doi.org/10.3390/app152312776 - 2 Dec 2025

Viewed by 352

Abstract

Maximum power point tracking (MPPT) under partial shading is a nonconvex, rapidly varying control problem that challenges multi-agent policies deployed on photovoltaic modules. We present Fuzzy–MAT3D, a fuzzy-augmented multi-agent TD3 (Twin-Delayed Deep Deterministic Policy Gradient) controller trained under centralized training/decentralized execution (CTDE). On [...] Read more.

Maximum power point tracking (MPPT) under partial shading is a nonconvex, rapidly varying control problem that challenges multi-agent policies deployed on photovoltaic modules. We present Fuzzy–MAT3D, a fuzzy-augmented multi-agent TD3 (Twin-Delayed Deep Deterministic Policy Gradient) controller trained under centralized training/decentralized execution (CTDE). On the theory side, we prove that differentiable fuzzy partitions of unity endow the actor–critic maps with global Lipschitz regularity, reduce temporal-difference target variance, enlarge the input-to-state stability (ISS) margin, and yield a global

L_{\infty}

γ

-contraction of fixed-policy evaluation (hence, non-expansive with

κ = γ < 1

). We further state a two-time-scale convergence theorem for CTDE-TD3 with fuzzy features; a PL/last-layer-linear corollary implies point convergence and uniqueness of critics. We bound the projected Bellman residual with the correct contraction factor (for

L^{\infty}

and

L^{2} (ρ)

under measure invariance) and quantified the negative bias induced by

min {Q_{1}, Q_{2}}

; an N-agent extension is provided. Empirically, a balanced common-random-numbers design across seven scenarios and 20 seeds, analyzed by ANOVA and CRN-paired tests, shows that Fuzzy–MAT3D attains the highest mean MPPT efficiency (92.0% ± 4.0%), outperforming MAT3D and Multi-Agent Deep Deterministic Policy Gradient controller (MADDPG). Overall, fuzzy regularization yields higher efficiency, suppresses steady-state oscillations, and stabilizes learning dynamics, supporting the use of structured, physics-compatible features in multi-agent MPPT controllers. At the level of PV plants, such gains under partial shading translate into higher effective capacity factors and smoother renewable generation without additional hardware. Full article

(This article belongs to the Special Issue Advances in Control and Optimization of Renewable Energy in Industrial Systems)

► Show Figures

Figure 1

42 pages, 9878 KB

Open AccessArticle

Adaptive Multi-Scale Bidirectional TD3 Algorithm for Layout Optimization of UAV–Base Station Coordination in Mountainous Areas

by Leyi Wang, Jianbo Tan, Hanbo Gong, Shiju E and Changjun Zhou

Drones 2025, 9(11), 805; https://doi.org/10.3390/drones9110805 - 18 Nov 2025

Viewed by 556

Abstract

With the rise of 6G communication technology, the issue of communication coverage in mountainous areas has become increasingly prominent. These regions are characterized by complex terrain, sparse user distribution, and small-scale clustering, making it difficult for traditional ground-based base stations, constrained by fixed [...] Read more.

With the rise of 6G communication technology, the issue of communication coverage in mountainous areas has become increasingly prominent. These regions are characterized by complex terrain, sparse user distribution, and small-scale clustering, making it difficult for traditional ground-based base stations, constrained by fixed locations and terrain obstructions, to achieve comprehensive signal coverage in mountainous areas. To address this challenge, this paper conducts an in-depth analysis of mountainous terrain and the differentiated needs of users, utilizing UAV-assisted base station signal coverage and designing an adaptive multi-scale bidirectional twin delayed deep deterministic policy gradient (AMB-TD3) algorithm to optimize base station layout and plan UAV routes. The algorithm significantly enhances performance by introducing a dynamic weight adaptation mechanism, multi-timescale coupling, and bidirectional information interaction strategies. In experiments, the best signal coverage rate of AMB-TD3 reached 98.094%, verifying its practicality in solving base station signal coverage issues in complex mountainous scenarios. Full article

(This article belongs to the Special Issue Advances in Internet of Drones: Applications, Communication Infrastructures, Architectures, and Protocols for FANETs)

► Show Figures

Figure 1

30 pages, 4806 KB

Open AccessArticle

A Hybrid Strategy Integrating Artificial Neural Networks for Enhanced Energy Production Optimization

by Aymen Lachheb, Noureddine Akoubi, Jamel Ben Salem, Lilia El Amraoui and Amal BaQais

Energies 2025, 18(22), 5941; https://doi.org/10.3390/en18225941 - 12 Nov 2025

Viewed by 614

Abstract

This paper presents a novel, robust, and reliable control strategy for renewable energy production systems, leveraging artificial neural networks (ANNs) to optimize performance and efficiency. Unlike conventional ANN approaches that rely on perturbation-based methods, we develop a fundamentally different ANN model incorporating equilibrium [...] Read more.

This paper presents a novel, robust, and reliable control strategy for renewable energy production systems, leveraging artificial neural networks (ANNs) to optimize performance and efficiency. Unlike conventional ANN approaches that rely on perturbation-based methods, we develop a fundamentally different ANN model incorporating equilibrium points (EPs) that achieve superior regulation of photovoltaic (PV) systems. The efficacy of the proposed approach is evaluated through comparative analysis against the conventional control strategy based on perturb and observe (MPPT/PO), demonstrating a 3.3% improvement in system efficiency (98.3% vs. 95%), a five times faster response time (6 s vs. 30 s), and six-fold reduction in voltage ripple (1% vs. 5.95%). A critical aspect of ANN-based controller design is the learning phase, which is addressed through the integration of deep reinforcement learning (DRL) for primary PV system control. Specifically, a hybrid control architecture combining the Artificial Neural Network based on Equilibrium Points (ANN/EP) model with DRL (ANN/PE-RL) is introduced, utilizing a synergistic integration of two reinforcement learning agents: Twin Delayed Deep Deterministic Policy Gradient (TD3) and Deep Deterministic Policy Gradient (DDPG). The TD3-based hybrid approach achieves an average reward value of 434.78 compared to 422.767 for DDPG, representing a 2.84% performance improvement in tracking maximum power points under imbalanced conditions. This hybrid approach demonstrates significant potential for improving the overall performance of grid-connected PV systems, reducing energy losses from 1.95% to below 1%, offering a promising solution for advanced renewable energy management. Full article

► Show Figures

Figure 1

27 pages, 4070 KB

Open AccessArticle

Research on a Cooperative Grasping Method for Heterogeneous Objects in Unstructured Scenarios of Mine Conveyor Belts Based on an Improved MATD3

by Rui Gao, Mengcong Liu, Jingyi Du, Yifan Bao, Xudong Wu and Jiahui Liu

Sensors 2025, 25(22), 6824; https://doi.org/10.3390/s25226824 - 7 Nov 2025

Viewed by 515

Abstract

Underground coal mine conveying systems operate in unstructured environments. Influenced by geological and operational factors, coal conveyors are frequently contaminated by foreign objects such as coal gangue and anchor bolts. These contaminants disrupt conveying stability and pose challenges to safe mining operations, making [...] Read more.

Underground coal mine conveying systems operate in unstructured environments. Influenced by geological and operational factors, coal conveyors are frequently contaminated by foreign objects such as coal gangue and anchor bolts. These contaminants disrupt conveying stability and pose challenges to safe mining operations, making their effective removal critical. Given the significant heterogeneity and unpredictability of these objects in shape, size, and orientation, precise manipulation requires dual-arm cooperative control. Traditional control algorithms rely on precise dynamic models and fixed parameters, lacking robustness in such unstructured environments. To address these challenges, this paper proposes a cooperative grasping method tailored for heterogeneous objects in unstructured environments. The MATD3 algorithm is employed to cooperatively perform dual-arm trajectory planning and grasping tasks. A multi-factor reward function is designed to accelerate convergence in continuous action spaces, optimize real-time grasping trajectories for foreign objects, and ensure stable robotic arm positioning. Furthermore, priority experience replay (PER) is integrated into the MATD3 framework to enhance experience utilization and accelerate convergence toward optimal policies. For slender objects, a sequential cooperative optimization strategy is developed to improve the stability and reliability of grasping and placement. Experimental results demonstrate that the P-MATD3 algorithm significantly improves grasping success rates and efficiency in unstructured environments. In single-arm tasks, compared to MATD3 and MADDPG, P-MATD3 increases grasping success rates by 7.1% and 9.94%, respectively, while reducing the number of steps required to reach the pre-grasping point by 11.44% and 12.77%. In dual-arm tasks, success rates increased by 5.58% and 9.84%, respectively, while step counts decreased by 11.6% and 18.92%. Robustness testing under Gaussian noise demonstrated that P-MATD3 maintains high stability even with varying noise intensities. Finally, ablation and comparative experiments comprehensively validated the proposed method’s effectiveness in simulated environments. Full article

(This article belongs to the Special Issue Perception and Control Technology for Intelligent Autonomous Unmanned Systems)

► Show Figures

Figure 1

33 pages, 5677 KB

Open AccessReview

Voltage Control for DC Microgrids: A Review and Comparative Evaluation of Deep Reinforcement Learning

by Sharafadeen Muhammad, Hussein Obeid, Abdelilah Hammou, Melika Hinaje and Hamid Gualous

Energies 2025, 18(21), 5706; https://doi.org/10.3390/en18215706 - 30 Oct 2025

Viewed by 1084

Abstract

Voltage stability in DC microgrids (DC MG) is crucial for ensuring reliable operation and component safety. This paper surveys voltage control techniques for DC MG, classifying them into model-based, model-free, and hybrid approaches. It analyzes their fundamental principles and evaluates their strengths and [...] Read more.

Voltage stability in DC microgrids (DC MG) is crucial for ensuring reliable operation and component safety. This paper surveys voltage control techniques for DC MG, classifying them into model-based, model-free, and hybrid approaches. It analyzes their fundamental principles and evaluates their strengths and limitations. In addition to the survey, the study investigates the voltage control problem in a critical scenario involving a DC/DC buck converter with an input LC filter. Two model-free deep reinforcement learning (DRL) control strategies are proposed: twin-delayed deep deterministic policy gradient (TD3) and proximal policy optimization (PPO) agents. Bayesian optimization (BO) is employed to enhance the performance of the agents by tuning their critical hyperparameters. Simulation results demonstrate the effectiveness of the DRL-based approaches: compared to benchmark methods, BO-TD3 achieves the lowest error metrics, reducing root mean square error (RMSE) by up to 5.6%, and mean absolute percentage error (MAPE) by 7.8%. Lastly, the study outlines future research directions for DRL-based voltage control aimed at improving voltage stability in DC MG. Full article

(This article belongs to the Special Issue Smart and Sustainable Energy Systems: Optimization, Modeling, and Management for Global Energy Challenges)

► Show Figures

Figure 1

20 pages, 3937 KB

Open AccessArticle

Prediction and Control of Hovercraft Cushion Pressure Based on Deep Reinforcement Learning

by Hua Zhou, Lijing Dong and Yuanhui Wang

J. Mar. Sci. Eng. 2025, 13(11), 2058; https://doi.org/10.3390/jmse13112058 - 28 Oct 2025

Viewed by 591

Abstract

This paper proposes a deep reinforcement learning-based predictive control scheme to address cushion pressure prediction and stabilization in hovercraft systems subject to modeling complexity, dynamic instability, and system delay. Notably, this work introduces a long short-term memory (LSTM) network with a temporal sliding [...] Read more.

This paper proposes a deep reinforcement learning-based predictive control scheme to address cushion pressure prediction and stabilization in hovercraft systems subject to modeling complexity, dynamic instability, and system delay. Notably, this work introduces a long short-term memory (LSTM) network with a temporal sliding window specifically designed for hovercraft cushion pressure forecasting. The model accurately captures the dynamic coupling between fan speed and chamber pressure while explicitly incorporating inherent control lag during airflow transmission. Furthermore, a novel adaptive behavior cloning mechanism is embedded into the twin delayed deep deterministic policy gradient with behavior cloning (TD3-BC) framework, which dynamically balances reinforcement learning (RL) objectives and historical policy constraints through an auto-adjusted weighting coefficient. This design effectively mitigates distribution shift and policy degradation in offline reinforcement learning, ensuring both training stability and performance beyond the behavior policy. By integrating the LSTM prediction model with the adaptive TD3-BC algorithm, a fully data-driven control architecture is established. Finally, simulation results demonstrate that the proposed method achieves high accuracy in cushion pressure tracking, significantly improves motion stability, and extends the operational lifespan of lift fans by reducing rotational speed fluctuations. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

18 pages, 6974 KB

Open AccessArticle

Prior-Guided Residual Reinforcement Learning for Active Suspension Control

by Jiansen Yang, Shengkun Wang, Fan Bai, Min Wei, Xuan Sun and Yan Wang

Machines 2025, 13(11), 983; https://doi.org/10.3390/machines13110983 - 24 Oct 2025

Viewed by 834

Abstract

Active suspension systems have gained significant attention for their capability to improve vehicle dynamics and energy efficiency. However, achieving consistent control performance under diverse and uncertain road conditions remains challenging. This paper proposes a prior-guided residual reinforcement learning framework for active suspension control. [...] Read more.

Active suspension systems have gained significant attention for their capability to improve vehicle dynamics and energy efficiency. However, achieving consistent control performance under diverse and uncertain road conditions remains challenging. This paper proposes a prior-guided residual reinforcement learning framework for active suspension control. The approach integrates a Linear Quadratic Regulator (LQR) as a prior controller to ensure baseline stability, while an enhanced Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm learns the residual control policy to improve adaptability and robustness. Moreover, residual connections and Long Short-Term Memory (LSTM) layers are incorporated into the TD3 structure to enhance dynamic modeling and training stability. The simulation results demonstrate that the proposed method achieves better control performance than passive suspension, a standalone LQR, and conventional TD3 algorithms. Full article

(This article belongs to the Special Issue Active and Passive Safety and Noise, Vibration, and Harshness (NVH) of Intelligent Vehicles)

► Show Figures

Figure 1

21 pages, 2648 KB

Open AccessArticle

A Hybrid Reinforcement Learning Framework Combining TD3 and PID Control for Robust Trajectory Tracking of a 5-DOF Robotic Arm

by Zied Ben Hazem, Firas Saidi, Nivine Guler and Ali Husain Altaif

Automation 2025, 6(4), 56; https://doi.org/10.3390/automation6040056 - 14 Oct 2025

Cited by 1 | Viewed by 1818

Abstract

This paper presents a hybrid reinforcement learning framework for trajectory tracking control of a 5-degree-of-freedom (DOF) Mitsubishi RV-2AJ robotic arm by integrating model-free deep reinforcement learning (DRL) algorithms with classical control strategies. A novel hybrid PID + TD3 agent is proposed, combining a [...] Read more.

This paper presents a hybrid reinforcement learning framework for trajectory tracking control of a 5-degree-of-freedom (DOF) Mitsubishi RV-2AJ robotic arm by integrating model-free deep reinforcement learning (DRL) algorithms with classical control strategies. A novel hybrid PID + TD3 agent is proposed, combining a Proportional–Integral–Derivative (PID) controller with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, and is compared against standalone TD3 and PID controllers. In this architecture, the PID controller provides baseline stability and deterministic disturbance rejection, while the TD3 agent learns residual corrections to enhance tracking accuracy, robustness, and control smoothness. The robotic system is modeled in MATLAB/Simulink with Simscape Multibody, and the agents are trained using a reward function inspired by artificial potential fields, promoting energy-efficient and precise motion. Extensive simulations are performed under internal disturbances (e.g., joint friction variations, payload changes) and external disturbances (e.g., unexpected forces, environmental interactions). Results demonstrate that the hybrid PID + TD3 approach outperforms both standalone TD3 and PID controllers in convergence speed, tracking precision, and disturbance rejection. This study highlights the effectiveness of combining reinforcement learning with classical control for intelligent, robust, and resilient robotic manipulation in uncertain environments. Full article

(This article belongs to the Topic New Trends in Robotics: Automation and Autonomous Systems)

► Show Figures

Figure 1

25 pages, 6498 KB

Open AccessArticle

SCPL-TD3: An Intelligent Evasion Strategy for High-Speed UAVs in Coordinated Pursuit-Evasion

by Xiaoyan Zhang, Tian Yan, Tong Li, Can Liu, Zijian Jiang and Jie Yan

Drones 2025, 9(10), 685; https://doi.org/10.3390/drones9100685 - 2 Oct 2025

Cited by 1 | Viewed by 634

Abstract

The rapid advancement of kinetic pursuit technologies has significantly increased the difficulty of evasion for high-speed UAVs (HSUAVs), particularly in scenarios where two collaboratively operating pursuers approach from the same direction with optimized initial space intervals. This paper begins by deriving an optimal [...] Read more.

The rapid advancement of kinetic pursuit technologies has significantly increased the difficulty of evasion for high-speed UAVs (HSUAVs), particularly in scenarios where two collaboratively operating pursuers approach from the same direction with optimized initial space intervals. This paper begins by deriving an optimal initial space interval to enhance cooperative pursuit effectiveness and introduces an evasion difficulty classification framework, thereby providing a structured approach for evaluating and optimizing evasion strategies. Based on this, an intelligent maneuver evasion strategy using semantic classification progressive learning with twin delayed deep deterministic policy gradient (SCPL-TD3) is proposed to address the challenging scenarios identified through the analysis. Training efficiency is enhanced by the proposed SCPL-TD3 algorithm through the employment of progressive learning to dynamically adjust training complexity and the integration of semantic classification to guide the learning process via meaningful state-action pattern recognition. Built upon the twin delayed deep deterministic policy gradient framework, the algorithm further enhances both stability and efficiency in complex environments. A specially designed reward function is incorporated to balance evasion performance with mission constraints, ensuring the fulfillment of HSUAV’s operational objectives. Simulation results demonstrate that the proposed approach significantly improves training stability and evasion effectiveness, achieving a 97.04% success rate and a 7.10–14.85% improvement in decision-making speed. Full article

► Show Figures

Figure 1

17 pages, 11694 KB

Open AccessArticle

RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels

by Shuqi Wang and Fengjiao Wang

Sensors 2025, 25(19), 6058; https://doi.org/10.3390/s25196058 - 2 Oct 2025

Viewed by 574

Abstract

As an emerging technology, Reconfigurable Intelligent Surfaces (RIS) offers an efficient communication performance optimization solution for the complex and spatially constrained environment of coal mines by effectively controlling signal-propagation paths. This study investigates the channel attenuation characteristics of a semi-circular arch coal-mine tunnel [...] Read more.

As an emerging technology, Reconfigurable Intelligent Surfaces (RIS) offers an efficient communication performance optimization solution for the complex and spatially constrained environment of coal mines by effectively controlling signal-propagation paths. This study investigates the channel attenuation characteristics of a semi-circular arch coal-mine tunnel with a dual RIS reflection link. By jointly optimizing the base-station beamforming matrix and the RIS phase-shift matrix, an improved Twin Delayed Deep Deterministic Policy Gradient (TD3)-based algorithm with a Noise Fading (TD3-NF) propagation optimization scheme is proposed, effectively improving the sum rate of the coal-mine wireless communication system. Simulation results show that when the transmit power is 38 dBm, the average link rate of the system reaches 11.1 bps/Hz, representing a 29.07% improvement compared to Deep Deterministic Policy Gradient (DDPG). The average sum rate of the 8 × 8 structure RIS is 3.3 bps/Hz higher than that of the 4 × 4 structure. The research findings provide new solutions for optimizing mine communication quality and applying artificial intelligence technology in complex environments. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

26 pages, 2589 KB

Open AccessArticle

Vision-Based Adaptive Control of Robotic Arm Using MN-MD3+BC

by Xianxia Zhang, Junjie Wu and Chang Zhao

Appl. Sci. 2025, 15(19), 10569; https://doi.org/10.3390/app151910569 - 30 Sep 2025

Viewed by 795

Abstract

Aiming at the problems of traditional calibrated visual servo systems relying on precise model calibration and the high training cost and low efficiency of online reinforcement learning, this paper proposes a Multi-Network Mean Delayed Deep Deterministic Policy Gradient Algorithm with Behavior Cloning (MN-MD3+BC) [...] Read more.

Aiming at the problems of traditional calibrated visual servo systems relying on precise model calibration and the high training cost and low efficiency of online reinforcement learning, this paper proposes a Multi-Network Mean Delayed Deep Deterministic Policy Gradient Algorithm with Behavior Cloning (MN-MD3+BC) for uncalibrated visual adaptive control of robotic arms. The algorithm improves upon the Twin Delayed Deep Deterministic Policy Gradient (TD3) network framework by adopting an architecture with one actor network and three critic networks, along with corresponding target networks. By constructing a multi-critic network integration mechanism, the mean output of the networks is used as the final Q-value estimate, effectively reducing the estimation bias of a single critic network. Meanwhile, a behavior cloning regularization term is introduced to address the common distribution shift problem in offline reinforcement learning. Furthermore, to obtain a high-quality dataset, an innovative data recombination-driven dataset creation method is proposed, which reduces training costs and avoids the risks of real-world exploration. The trained policy network is embedded into the actual system as an adaptive controller, driving the robotic arm to gradually approach the target position through closed-loop control. The algorithm is applied to uncalibrated multi-degree-of-freedom robotic arm visual servo tasks, providing an adaptive and low-dependency solution for dynamic and complex scenarios. MATLAB simulations and experiments on the WPR1 platform demonstrate that, compared to traditional Jacobian matrix-based model-free methods, the proposed approach exhibits advantages in tracking accuracy, error convergence speed, and system stability. Full article

(This article belongs to the Special Issue Intelligent Control of Robotic System)

► Show Figures

Figure 1

Search Results (155)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (155)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI