Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (526)

Search Parameters:
Keywords = policy gradient reinforcement learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
37 pages, 3577 KB  
Article
Research on Energy-Saving and Efficiency-Improving Optimization of a Four-Way Shuttle-Based Dense Three-Dimensional Warehouse System Based on Two-Stage Deep Reinforcement Learning
by Yang Xiang, Xingyu Jin, Kaiqian Lei and Qin Zhang
Appl. Sci. 2025, 15(21), 11367; https://doi.org/10.3390/app152111367 - 23 Oct 2025
Abstract
In the context of rapid development within the logistics sector and widespread advocacy for sustainable development, this paper proposes enhancements to the task scheduling and path planning components of four-way shuttle systems. The focus lies on refining and innovating modeling approaches and algorithms [...] Read more.
In the context of rapid development within the logistics sector and widespread advocacy for sustainable development, this paper proposes enhancements to the task scheduling and path planning components of four-way shuttle systems. The focus lies on refining and innovating modeling approaches and algorithms to address issues in complex environments such as uneven task distribution, poor adaptability to dynamic conditions, and high rates of idle vehicle operation. These improvements aim to enhance system performance, reduce energy consumption, and achieve sustainable development. Therefore, this paper presents an energy-saving and efficiency-enhancing optimization study for a four-way shuttle-based high-density automated warehouse system, utilizing deep reinforcement learning. In terms of task scheduling, a collaborative scheduling algorithm based on an Improved Genetic Algorithm (IGA) and Multi-Agent Deep Deterministic Policy Gradient (MADDPG) has been designed. In terms of path planning, this paper provides the A*-DQN method, which integrates the A* algorithm(A*) with Deep Q-Networks (DQN). Through combining multiple layout scenarios and adjusting various parameters, simulation experiments verified that the system error is within 5% or less. Compared to existing methods, the total task duration, path planning length, and energy consumption per order decreased by approximately 12.84%, 9.05%, and 16.68%, respectively. The four-way shuttle vehicle can complete order tasks with virtually no conflicts. The conclusions of this paper have been validated through simulation experiments. Full article
30 pages, 8790 KB  
Article
An Adaptive Framework for Remaining Useful Life Prediction Integrating Attention Mechanism and Deep Reinforcement Learning
by Yanhui Bai, Jiajia Du, Honghui Li, Xintao Bao, Linjun Li, Chun Zhang, Jiahe Yan, Renliang Wang and Yi Xu
Sensors 2025, 25(20), 6354; https://doi.org/10.3390/s25206354 - 14 Oct 2025
Viewed by 592
Abstract
The prediction of Remaining Useful Life (RUL) constitutes a vital aspect of Prognostics and Health Management (PHM), providing capabilities for the assessment of mechanical component health status and prediction of failure instances. Recent studies on feature extraction, time-series modeling, and multi-task learning have [...] Read more.
The prediction of Remaining Useful Life (RUL) constitutes a vital aspect of Prognostics and Health Management (PHM), providing capabilities for the assessment of mechanical component health status and prediction of failure instances. Recent studies on feature extraction, time-series modeling, and multi-task learning have shown remarkable advancements. However, most deep learning (DL) techniques predominantly focus on unimodal data or static feature extraction techniques, resulting in a lack of RUL prediction methods that can effectively capture the individual differences among heterogeneous sensors and failure modes under complex operational conditions. To overcome these limitations, an adaptive RUL prediction framework named ADAPT-RULNet is proposed for mechanical components, integrating the feature extraction capabilities of attention-enhanced deep learning (DL) and the decision-making abilities of deep reinforcement learning (DRL) to achieve end-to-end optimization from raw data to accurate RUL prediction. Initially, Functional Alignment Resampling (FAR) is employed to generate high-quality functional signals; then, attention-enhanced Dynamic Time Warping (DTW) is leveraged to obtain individual degradation stages. Subsequently, an attention-enhanced of hybrid multi-scale RUL prediction network is constructed to extract both local and global features from multi-format data. Furthermore, the network achieves optimal feature representation by adaptively fusing multi-source features through Bayesian methods. Finally, we innovatively introduce a Deep Deterministic Policy Gradient (DDPG) strategy from DRL to adaptively optimize key parameters in the construction of individual degradation stages and achieve a global balance between model complexity and prediction accuracy. The proposed model was evaluated on aircraft engines and railway freight car wheels. The results indicate that it achieves a lower average Root Mean Square Error (RMSE) and higher accuracy in comparison with current approaches. Moreover, the method shows strong potential for improving prediction accuracy and robustness in varied industrial applications. Full article
Show Figures

Figure 1

21 pages, 2648 KB  
Article
A Hybrid Reinforcement Learning Framework Combining TD3 and PID Control for Robust Trajectory Tracking of a 5-DOF Robotic Arm
by Zied Ben Hazem, Firas Saidi, Nivine Guler and Ali Husain Altaif
Automation 2025, 6(4), 56; https://doi.org/10.3390/automation6040056 - 14 Oct 2025
Viewed by 510
Abstract
This paper presents a hybrid reinforcement learning framework for trajectory tracking control of a 5-degree-of-freedom (DOF) Mitsubishi RV-2AJ robotic arm by integrating model-free deep reinforcement learning (DRL) algorithms with classical control strategies. A novel hybrid PID + TD3 agent is proposed, combining a [...] Read more.
This paper presents a hybrid reinforcement learning framework for trajectory tracking control of a 5-degree-of-freedom (DOF) Mitsubishi RV-2AJ robotic arm by integrating model-free deep reinforcement learning (DRL) algorithms with classical control strategies. A novel hybrid PID + TD3 agent is proposed, combining a Proportional–Integral–Derivative (PID) controller with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, and is compared against standalone TD3 and PID controllers. In this architecture, the PID controller provides baseline stability and deterministic disturbance rejection, while the TD3 agent learns residual corrections to enhance tracking accuracy, robustness, and control smoothness. The robotic system is modeled in MATLAB/Simulink with Simscape Multibody, and the agents are trained using a reward function inspired by artificial potential fields, promoting energy-efficient and precise motion. Extensive simulations are performed under internal disturbances (e.g., joint friction variations, payload changes) and external disturbances (e.g., unexpected forces, environmental interactions). Results demonstrate that the hybrid PID + TD3 approach outperforms both standalone TD3 and PID controllers in convergence speed, tracking precision, and disturbance rejection. This study highlights the effectiveness of combining reinforcement learning with classical control for intelligent, robust, and resilient robotic manipulation in uncertain environments. Full article
(This article belongs to the Topic New Trends in Robotics: Automation and Autonomous Systems)
Show Figures

Figure 1

19 pages, 6362 KB  
Article
Micro-Platform Verification for LiDAR SLAM-Based Navigation of Mecanum-Wheeled Robot in Warehouse Environment
by Yue Wang, Ying Yu Ye, Wei Zhong, Bo Lin Gao, Chong Zhang Mu and Ning Zhao
World Electr. Veh. J. 2025, 16(10), 571; https://doi.org/10.3390/wevj16100571 - 8 Oct 2025
Viewed by 395
Abstract
Path navigation for mobile robots critically determines the operational efficiency of warehouse logistics systems. However, the current QR (Quick Response) code path navigation for warehouses suffers from low operational efficiency and poor dynamic adaptability in complex dynamic environments. This paper introduces a deep [...] Read more.
Path navigation for mobile robots critically determines the operational efficiency of warehouse logistics systems. However, the current QR (Quick Response) code path navigation for warehouses suffers from low operational efficiency and poor dynamic adaptability in complex dynamic environments. This paper introduces a deep reinforcement learning and hybrid-algorithm SLAM (Simultaneous Localization and Mapping) path navigation method for Mecanum-wheeled robots, validated with an emphasis on dynamic adaptability and real-time performance. Based on the Gazebo warehouse simulation environment, the TD3 (Twin Deep Deterministic Policy Gradient) path planning method was established for offline training. Then, the Astar-Time Elastic Band (TEB) hybrid path planning algorithm was used to conduct experimental verification in static and dynamic real-world scenarios. Finally, experiments show that the TD3-based path planning for mobile robots makes effective decisions during offline training in the simulation environment, while Astar-TEB accurately completes path planning and navigates around both static and dynamic obstacles in real-world scenarios. Therefore, this verifies the feasibility and effectiveness of the proposed SLAM path navigation for Mecanum-wheeled mobile robots on a miniature warehouse platform. Full article
(This article belongs to the Special Issue Research on Intelligent Vehicle Path Planning Algorithm)
Show Figures

Figure 1

25 pages, 2714 KB  
Article
Evaluating Municipal Solid Waste Incineration Through Determining Flame Combustion to Improve Combustion Processes for Environmental Sanitation
by Jian Tang, Xiaoxian Yang, Wei Wang and Jian Rong
Sustainability 2025, 17(19), 8872; https://doi.org/10.3390/su17198872 - 4 Oct 2025
Viewed by 301
Abstract
Municipal solid waste (MSW) refers to solid and semi-solid waste generated during human production and daily activities. The process of incinerating such waste, known as municipal solid waste incineration (MSWI), serves as a critical method for reducing waste volume and recovering resources. Automatic [...] Read more.
Municipal solid waste (MSW) refers to solid and semi-solid waste generated during human production and daily activities. The process of incinerating such waste, known as municipal solid waste incineration (MSWI), serves as a critical method for reducing waste volume and recovering resources. Automatic online recognition of flame combustion status during MSWI is a key technical approach to ensuring system stability, addressing issues such as high pollution emissions, severe equipment wear, and low operational efficiency. However, when manually selecting optimized features and hyperparameters based on empirical experience, the MSWI flame combustion state recognition model suffers from high time consumption, strong dependency on expertise, and difficulty in adaptively obtaining optimal solutions. To address these challenges, this article proposes a method for constructing a flame combustion state recognition model optimized based on reinforcement learning (RL), long short-term memory (LSTM), and parallel differential evolution (PDE) algorithms, achieving collaborative optimization of deep features and model hyperparameters. First, the feature selection and hyperparameter optimization problem of the ViT-IDFC combustion state recognition model is transformed into an encoding design and optimization problem for the PDE algorithm. Then, the mutation and selection factors of the PDE algorithm are used as modeling inputs for LSTM, which predicts the optimal hyperparameters based on PDE outputs. Next, during the PDE-based optimization of the ViT-IDFC model, a policy gradient reinforcement learning method is applied to determine the parameters of the LSTM model. Finally, the optimized combustion state recognition model is obtained by identifying the feature selection parameters and hyperparameters of the ViT-IDFC model. Test results based on an industrial image dataset demonstrate that the proposed optimization algorithm improves the recognition performance of both left and right grate recognition models, with the left grate achieving a 0.51% increase in recognition accuracy and the right grate a 0.74% increase. Full article
(This article belongs to the Section Waste and Recycling)
Show Figures

Figure 1

17 pages, 6267 KB  
Article
Local and Remote Digital Pre-Distortion for 5G Power Amplifiers with Safe Deep Reinforcement Learning
by Christian Spano, Damiano Badini, Lorenzo Cazzella and Matteo Matteucci
Sensors 2025, 25(19), 6102; https://doi.org/10.3390/s25196102 - 3 Oct 2025
Viewed by 524
Abstract
The demand for higher data rates and energy efficiency in wireless communication systems drives power amplifiers (PAs) into nonlinear operation, causing signal distortions that hinder performance. Digital Pre-Distortion (DPD) addresses these distortions, but existing systems face challenges with complexity, adaptability, and resource limitations. [...] Read more.
The demand for higher data rates and energy efficiency in wireless communication systems drives power amplifiers (PAs) into nonlinear operation, causing signal distortions that hinder performance. Digital Pre-Distortion (DPD) addresses these distortions, but existing systems face challenges with complexity, adaptability, and resource limitations. This paper introduces DRL-DPD, a Deep Reinforcement Learning-based solution for DPD that aims to reduce computational burden, improve adaptation to dynamic environments, and minimize resource consumption. To ensure safety and regulatory compliance, we integrate an ad-hoc Safe Reinforcement Learning algorithm, CRE-DDPG (Cautious-Recoverable-Exploration Deep Deterministic Policy Gradient), which prevents ACLR measurements from falling below safety thresholds. Simulations and hardware experiments demonstrate the potential of DRL-DPD with CRE-DDPG to surpass current DPD limitations in both local and remote configurations, paving the way for more efficient communication systems, especially in the context of 5G and beyond. Full article
Show Figures

Figure 1

26 pages, 2589 KB  
Article
Vision-Based Adaptive Control of Robotic Arm Using MN-MD3+BC
by Xianxia Zhang, Junjie Wu and Chang Zhao
Appl. Sci. 2025, 15(19), 10569; https://doi.org/10.3390/app151910569 - 30 Sep 2025
Viewed by 295
Abstract
Aiming at the problems of traditional calibrated visual servo systems relying on precise model calibration and the high training cost and low efficiency of online reinforcement learning, this paper proposes a Multi-Network Mean Delayed Deep Deterministic Policy Gradient Algorithm with Behavior Cloning (MN-MD3+BC) [...] Read more.
Aiming at the problems of traditional calibrated visual servo systems relying on precise model calibration and the high training cost and low efficiency of online reinforcement learning, this paper proposes a Multi-Network Mean Delayed Deep Deterministic Policy Gradient Algorithm with Behavior Cloning (MN-MD3+BC) for uncalibrated visual adaptive control of robotic arms. The algorithm improves upon the Twin Delayed Deep Deterministic Policy Gradient (TD3) network framework by adopting an architecture with one actor network and three critic networks, along with corresponding target networks. By constructing a multi-critic network integration mechanism, the mean output of the networks is used as the final Q-value estimate, effectively reducing the estimation bias of a single critic network. Meanwhile, a behavior cloning regularization term is introduced to address the common distribution shift problem in offline reinforcement learning. Furthermore, to obtain a high-quality dataset, an innovative data recombination-driven dataset creation method is proposed, which reduces training costs and avoids the risks of real-world exploration. The trained policy network is embedded into the actual system as an adaptive controller, driving the robotic arm to gradually approach the target position through closed-loop control. The algorithm is applied to uncalibrated multi-degree-of-freedom robotic arm visual servo tasks, providing an adaptive and low-dependency solution for dynamic and complex scenarios. MATLAB simulations and experiments on the WPR1 platform demonstrate that, compared to traditional Jacobian matrix-based model-free methods, the proposed approach exhibits advantages in tracking accuracy, error convergence speed, and system stability. Full article
(This article belongs to the Special Issue Intelligent Control of Robotic System)
Show Figures

Figure 1

28 pages, 3341 KB  
Article
Research on Dynamic Energy Management Optimization of Park Integrated Energy System Based on Deep Reinforcement Learning
by Xinjian Jiang, Lei Zhang, Fuwang Li, Zhiru Li, Zhijian Ling and Zhenghui Zhao
Energies 2025, 18(19), 5172; https://doi.org/10.3390/en18195172 - 29 Sep 2025
Viewed by 345
Abstract
Under the background of energy transition, the Integrated Energy System (IES) of the park has become a key carrier for enhancing the consumption capacity of renewable energy due to its multi-energy complementary characteristics. However, the high proportion of wind and solar resource access [...] Read more.
Under the background of energy transition, the Integrated Energy System (IES) of the park has become a key carrier for enhancing the consumption capacity of renewable energy due to its multi-energy complementary characteristics. However, the high proportion of wind and solar resource access and the fluctuation of diverse loads have led to the system facing dual uncertainty challenges, and traditional optimization methods are difficult to adapt to the dynamic and complex dispatching requirements. To this end, this paper proposes a new dynamic energy management method based on Deep Reinforcement Learning (DRL) and constructs an IES hybrid integer nonlinear programming model including wind power, photovoltaic, combined heat and power generation, and storage of electric heat energy, with the goal of minimizing the operating cost of the system. By expressing the dispatching process as a Markov decision process, a state space covering wind and solar output, multiple loads and energy storage states is defined, a continuous action space for unit output and energy storage control is constructed, and a reward function integrating economic cost and the penalty for renewable energy consumption is designed. The Deep Deterministic Policy Gradient (DDPG) and Deep Q-Network (DQN) algorithms were adopted to achieve policy optimization. This study is based on simulation rather than experimental validation, which aligns with the exploratory scope of this research. The simulation results show that the DDPG algorithm achieves an average weekly operating cost of 532,424 yuan in the continuous action space scheduling, which is 8.6% lower than that of the DQN algorithm, and the standard deviation of the cost is reduced by 19.5%, indicating better robustness. Under the fluctuation of 10% to 30% on the source-load side, the DQN algorithm still maintains a cost fluctuation of less than 4.5%, highlighting the strong adaptability of DRL to uncertain environments. Therefore, this method has significant theoretical and practical value for promoting the intelligent transformation of the energy system. Full article
Show Figures

Figure 1

18 pages, 4509 KB  
Article
Reinforcement Learning Stabilization for Quadrotor UAVs via Lipschitz-Constrained Policy Regularization
by Jiale Quan, Weijun Hu, Xianlong Ma and Gang Chen
Drones 2025, 9(10), 675; https://doi.org/10.3390/drones9100675 - 26 Sep 2025
Viewed by 492
Abstract
Reinforcement learning (RL), and in particular Proximal Policy Optimization (PPO), has shown promise in high-precision quadrotor unmanned aerial vehicle (QUAV) control. However, the performance of PPO is highly sensitive to the choice of the clipping parameter, and inappropriate settings can lead to unstable [...] Read more.
Reinforcement learning (RL), and in particular Proximal Policy Optimization (PPO), has shown promise in high-precision quadrotor unmanned aerial vehicle (QUAV) control. However, the performance of PPO is highly sensitive to the choice of the clipping parameter, and inappropriate settings can lead to unstable training dynamics and excessive policy oscillations, which limit deployment in safety-critical aerial applications. To address this issue, we propose a stability-aware dynamic clipping parameter adjustment strategy, which adapts the clipping threshold ϵt in real time based on a stability variance metric St. This adaptive mechanism balances exploration and stability throughout the training process. Furthermore, we provide a Lipschitz continuity interpretation of the clipping mechanism, showing that its adaptation implicitly adjusts a bound on the policy update step, thereby offering a deterministic guarantee on the oscillation magnitude. Extensive simulation results demonstrate that the proposed method reduces policy variance by 45% and accelerates convergence compared to baseline PPO, resulting in smoother control responses and improved robustness under dynamic operating conditions. While developed within the PPO framework, the proposed approach is readily applicable to other on policy policy gradient methods. Full article
Show Figures

Figure 1

18 pages, 812 KB  
Article
Deep Reinforcement Learning for Adaptive Robotic Grasping and Post-Grasp Manipulation in Simulated Dynamic Environments
by Henrique C. Ferreira and Ramiro S. Barbosa
Future Internet 2025, 17(10), 437; https://doi.org/10.3390/fi17100437 - 26 Sep 2025
Viewed by 636
Abstract
This article presents a deep reinforcement learning (DRL) approach for adaptive robotic grasping in dynamic environments. We developed UR5GraspingEnv, a PyBullet-based simulation environment integrated with OpenAI Gym, to train a UR5 robotic arm with a Robotiq 2F-85 gripper. Soft Actor-Critic (SAC) and Proximal [...] Read more.
This article presents a deep reinforcement learning (DRL) approach for adaptive robotic grasping in dynamic environments. We developed UR5GraspingEnv, a PyBullet-based simulation environment integrated with OpenAI Gym, to train a UR5 robotic arm with a Robotiq 2F-85 gripper. Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) were implemented to learn robust grasping policies for randomly positioned objects. A tailored reward function, combining distance penalties, grasp, and pose rewards, optimizes grasping and post-grasping tasks, enhanced by domain randomization. SAC achieves an 87% grasp success rate and 75% post-grasp success, outperforming PPO 82% and 68%, with stable convergence over 100,000 timesteps. The system addresses post-grasping manipulation and sim-to-real transfer challenges, advancing industrial and assistive applications. Results demonstrate the feasibility of learning stable and goal-driven policies for single-arm robotic manipulation using minimal supervision. Both PPO and SAC yield competitive performance, with SAC exhibiting superior adaptability in cluttered or edge cases. These findings suggest that DRL, when carefully designed and monitored, can support scalable learning in manipulation tasks. Full article
(This article belongs to the Special Issue Artificial Intelligence and Control Systems for Industry 4.0 and 5.0)
Show Figures

Figure 1

25 pages, 2096 KB  
Article
A Fuzzy Multi-Objective Sustainable and Agile Supply Chain Model Based on Digital Twin and Internet of Things with Adaptive Learning Under Environmental Uncertainty
by Hamed Nozari, Agnieszka Szmelter-Jarosz and Dariusz Weiland
Appl. Sci. 2025, 15(19), 10399; https://doi.org/10.3390/app151910399 - 25 Sep 2025
Viewed by 420
Abstract
This paper presents an advanced, adaptive model for designing and optimizing agile and sustainable supply chains by integrating fuzzy multi-objective programming, Internet of Things (IoT), digital twin (DT) technologies, and reinforcement learning. Unlike conventional static models, the proposed framework utilizes real-time data and [...] Read more.
This paper presents an advanced, adaptive model for designing and optimizing agile and sustainable supply chains by integrating fuzzy multi-objective programming, Internet of Things (IoT), digital twin (DT) technologies, and reinforcement learning. Unlike conventional static models, the proposed framework utilizes real-time data and dynamically updates fuzzy parameters through a deep deterministic policy gradient (DDPG) algorithm. The model simultaneously addresses three conflicting objectives: minimizing cost, delivery time, and carbon emissions, while maximizing agility. To validate the model’s effectiveness, various optimization strategies including NSGA-II, MOPSO, and the Whale Optimization Algorithm are applied across small- to large-scale scenarios. Results demonstrate that the integration of IoT and DT, alongside adaptive learning, significantly improves decision accuracy, responsiveness, and sustainability. The model is particularly suited for high-volatility environments, offering decision-makers an intelligent, real-time support tool. Case study simulations further illustrate the model’s value in sectors such as urban logistics and humanitarian aid supply chains. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in the IoT)
Show Figures

Figure 1

27 pages, 4674 KB  
Article
Design of a Robust Adaptive Cascade Fractional-Order Proportional–Integral–Derivative Controller Enhanced by Reinforcement Learning Algorithm for Speed Regulation of Brushless DC Motor in Electric Vehicles
by Seyyed Morteza Ghamari, Mehrdad Ghahramani, Daryoush Habibi and Asma Aziz
Energies 2025, 18(19), 5056; https://doi.org/10.3390/en18195056 - 23 Sep 2025
Viewed by 512
Abstract
Brushless DC (BLDC) motors are commonly used in electric vehicles (EVs) because of their efficiency, small size and great torque-speed performance. These motors have a few benefits such as low maintenance, increased reliability and power density. Nevertheless, BLDC motors are highly nonlinear and [...] Read more.
Brushless DC (BLDC) motors are commonly used in electric vehicles (EVs) because of their efficiency, small size and great torque-speed performance. These motors have a few benefits such as low maintenance, increased reliability and power density. Nevertheless, BLDC motors are highly nonlinear and their dynamics are very complicated, in particular, under changing load and supply conditions. The above features require the design of strong and adaptable control methods that can ensure performance over a broad spectrum of disturbances and uncertainties. In order to overcome these issues, this paper uses a Fractional-Order Proportional-Integral-Derivative (FOPID) controller that offers better control precision, better frequency response, and an extra degree of freedom in tuning by using non-integer order terms. Although it has the benefits, there are three primary drawbacks: (i) it is not real-time adaptable, (ii) it is hard to choose appropriate initial gain values, and (iii) it is sensitive to big disturbances and parameter changes. A new control framework is suggested to address these problems. First, a Reinforcement Learning (RL) approach based on Deep Deterministic Policy Gradient (DDPG) is presented to optimize the FOPID gains online so that the controller can adjust itself continuously to the variations in the system. Second, Snake Optimization (SO) algorithm is used in fine-tuning of the FOPID parameters at the initial stages to guarantee stable convergence. Lastly, cascade control structure is adopted, where FOPID controllers are used in the inner (current) and outer (speed) loops. This construction adds robustness to the system as a whole and minimizes the effect of disturbances on the performance. In addition, the cascade design also allows more coordinated and smooth control actions thus reducing stress on the power electronic switches, which reduces switching losses and the overall efficiency of the drive system. The suggested RL-enhanced cascade FOPID controller is verified by Hardware-in-the-Loop (HIL) testing, which shows better performance in the aspects of speed regulation, robustness, and adaptability to realistic conditions of operation in EV applications. Full article
Show Figures

Figure 1

19 pages, 1661 KB  
Article
A Reinforcement Learning-Based Approach for Distributed Photovoltaic Carrying Capacity Analysis in Distribution Grids
by Shumin Sun, Song Yang, Peng Yu, Yan Cheng, Jiawei Xing, Yuejiao Wang, Yu Yi, Zhanyang Hu, Liangzhong Yao and Xuanpei Pang
Energies 2025, 18(18), 5029; https://doi.org/10.3390/en18185029 - 22 Sep 2025
Viewed by 335
Abstract
Driven by the “double carbon” goals, the penetration rate of distributed photovoltaics (PV) in distribution networks has increased rapidly. However, the continuous growth of distributed PV installed capacity poses significant challenges to the carrying capacity of distribution networks. Reinforcement learning (RL), with its [...] Read more.
Driven by the “double carbon” goals, the penetration rate of distributed photovoltaics (PV) in distribution networks has increased rapidly. However, the continuous growth of distributed PV installed capacity poses significant challenges to the carrying capacity of distribution networks. Reinforcement learning (RL), with its capability to handle high-dimensional nonlinear problems, plays a critical role in analyzing the carrying capacity of distribution networks. This study constructs an evaluation model for distributed PV carrying capacity and proposes a corresponding quantitative evaluation index system by analyzing the core factors influencing it. An optimization scheme based on deep reinforcement learning is adopted, introducing the Deep Deterministic Policy Gradient (DDPG) algorithm to solve the evaluation model. Finally, simulations on the IEEE-33 bus system validate the good feasibility of the reinforcement learning approach for this problem. Full article
Show Figures

Figure 1

24 pages, 2157 KB  
Article
Research on Aerodynamic Force/Thrust Vector Combined Trajectory Optimization Method for Hypersonic Drones Based on Deep Reinforcement Learning
by Zijun Zhang, Yunfan Zhou, Leichao Yang, Wenzhong Jin and Jun Wang
Actuators 2025, 14(9), 461; https://doi.org/10.3390/act14090461 - 22 Sep 2025
Viewed by 412
Abstract
This paper addresses the cruise range maximization problem for hypersonic drones by proposing a combined aerodynamic force/thrust vector trajectory optimization method. A novel continuous linear parameterization strategy for trajectory optimization is innovatively developed, achieving continuous thrust vector trajectory optimization throughout the entire flight [...] Read more.
This paper addresses the cruise range maximization problem for hypersonic drones by proposing a combined aerodynamic force/thrust vector trajectory optimization method. A novel continuous linear parameterization strategy for trajectory optimization is innovatively developed, achieving continuous thrust vector trajectory optimization throughout the entire flight using only 21 parameters through recursive linear function design. This approach reduces parameter dimensionality and effectively addresses sparse rewards and training difficulties in reinforcement learning. The study integrates the Deep Deterministic Policy Gradient (DDPG) algorithm with deep residual networks for trajectory optimization, systematically exploring the impact mechanisms of different aerodynamic force and thrust vector combination modes on range performance. Through collaborative trajectory optimization of thrust vectors and flight height, simulation results demonstrate that the combined trajectory optimization strategy achieves a total range enhancement of approximately 146.14 km compared to pure aerodynamic control, with continuous linearly parameterized thrust vector trajectory optimization providing superior performance over traditional segmented methods. These results verify the significant advantages of the proposed trajectory optimization approach and the effectiveness of the deep reinforcement learning framework. Full article
(This article belongs to the Section Aerospace Actuators)
Show Figures

Figure 1

27 pages, 9914 KB  
Article
Design of Robust Adaptive Nonlinear Backstepping Controller Enhanced by Deep Deterministic Policy Gradient Algorithm for Efficient Power Converter Regulation
by Seyyed Morteza Ghamari, Asma Aziz and Mehrdad Ghahramani
Energies 2025, 18(18), 4941; https://doi.org/10.3390/en18184941 - 17 Sep 2025
Viewed by 416
Abstract
Power converters play an important role in incorporating renewable energy sources into power systems. Among different converter designs, Buck and Boost converters are popular, as they use fewer components and deliver cost savings and high efficiency. However, Boost converters are known as non–minimum [...] Read more.
Power converters play an important role in incorporating renewable energy sources into power systems. Among different converter designs, Buck and Boost converters are popular, as they use fewer components and deliver cost savings and high efficiency. However, Boost converters are known as non–minimum phase systems, imposing harder constraints for designing a robust converter. Developing an efficient controller for these topologies can be difficult since they exhibit nonlinearity and distortion in high frequency modes. The Lyapunov-based Adaptive Backstepping Control (ABSC) technology is used to regulate suitable outputs for these structures. This approach is an updated version of the technique that uses the stability Lyapunov function to produce increased stability and resistance to fluctuations in real-world circumstances. However, in real-time situations, disturbances with larger ranges such as supply voltage changes, parameter variations, and noise may have a negative impact on the operation of this strategy. To increase the controller’s flexibility under more difficult working settings, the most appropriate first gains must be established. To solve these concerns, the ABSC’s performance is optimized using the Reinforcement Learning (RL) adaptive technique. RL has several advantages, including lower susceptibility to error, more trustworthy findings obtained from data gathering from the environment, perfect model behavior within a certain context, and better frequency matching in real-time applications. Random exploration, on the other hand, can have disastrous effects and produce unexpected results in real-world situations. As a result, we choose the Deep Deterministic Policy Gradient (DDPG) approach, which uses a deterministic action function rather than a stochastic one. Its key advantages include effective handling of continuous action spaces, improved sample efficiency through off-policy learning, and faster convergence via its actor–critic architecture that balances value estimation and policy optimization. Furthermore, this technique uses the Grey Wolf Optimization (GWO) algorithm to improve the initial set of gains, resulting in more reliable outcomes and quicker dynamics. The GWO technique is notable for its disciplined and nature-inspired approach, which leads to faster decision-making and greater accuracy than other optimization methods. This method considers the system as a black box without its exact mathematical modeling, leading to lower complexity and computational burden. The effectiveness of this strategy is tested in both modeling and experimental scenarios utilizing the Hardware-In-Loop (HIL) framework, with considerable results and decreased error sensitivity. Full article
(This article belongs to the Special Issue Power Electronics for Smart Grids: Present and Future Perspectives II)
Show Figures

Figure 1

Back to TopTop